When to Automate

Consider automation when you need to:

  • Submit many jobs with different input files or parameters
  • Generate batch scripts dynamically based on data
  • Create complex job workflows with dependencies
  • Run the same analysis on multiple datasets

Note: For simple parameter sweeps, array jobs are often easier than scripted submissions.

Shell Script Loops

The simplest automation is a shell loop that submits multiple jobs:

Submit jobs for multiple input files

#!/bin/bash
# submit_all.sh - Submit a job for each input file

for file in data/*.csv; do
    filename=$(basename "$file" .csv)
    sbatch --job-name="process_${filename}" \
           --output="logs/${filename}.out" \
           --error="logs/${filename}.err" \
           process_job.sh "$file"
done

The batch script process_job.sh receives the filename as $1:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --time=01:00:00

INPUT_FILE=$1
OUTPUT_FILE="results/$(basename $INPUT_FILE .csv)_result.csv"

./analyze.py --input "$INPUT_FILE" --output "$OUTPUT_FILE"

Parameter sweep with loops

#!/bin/bash
# submit_sweep.sh - Parameter sweep

for temp in 100 200 300 400 500; do
    for pressure in 1 5 10 50 100; do
        sbatch --job-name="sim_T${temp}_P${pressure}" \
               --output="logs/sim_T${temp}_P${pressure}.out" \
               simulation.sh $temp $pressure
    done
done

Generating Batch Scripts

For more complex jobs, generate the entire batch script dynamically:

Shell script generator

#!/bin/bash
# generate_and_submit.sh

for i in $(seq 1 10); do
    # Generate batch script
    cat > job_${i}.sh << EOF
#!/bin/bash
#SBATCH --job-name=run_${i}
#SBATCH --output=logs/run_${i}.out
#SBATCH --error=logs/run_${i}.err
#SBATCH --ntasks=1
#SBATCH --time=02:00:00

echo "Running iteration ${i}"
./myprogram --iteration ${i} --seed $((RANDOM))
EOF

    # Submit the generated script
    sbatch job_${i}.sh
done

Python script generator

#!/usr/bin/env python3
# generate_jobs.py

import subprocess
import os

parameters = [
    {'name': 'small', 'size': 100, 'time': '01:00:00'},
    {'name': 'medium', 'size': 1000, 'time': '04:00:00'},
    {'name': 'large', 'size': 10000, 'time': '12:00:00'},
]

os.makedirs('generated_scripts', exist_ok=True)
os.makedirs('logs', exist_ok=True)

for param in parameters:
    script_content = f"""#!/bin/bash
#SBATCH --job-name={param['name']}
#SBATCH --output=logs/{param['name']}.out
#SBATCH --error=logs/{param['name']}.err
#SBATCH --ntasks=4
#SBATCH --time={param['time']}

./simulation --size {param['size']} --output results/{param['name']}.dat
"""

    script_path = f"generated_scripts/{param['name']}.sh"
    with open(script_path, 'w') as f:
        f.write(script_content)

    # Submit the job
    result = subprocess.run(['sbatch', script_path], capture_output=True, text=True)
    print(f"Submitted {param['name']}: {result.stdout.strip()}")

Job Dependencies

Chain jobs together so they run in sequence:

Linear pipeline

#!/bin/bash
# submit_pipeline.sh

# Submit first job, capture job ID
JOB1=$(sbatch --parsable preprocess.sh)
echo "Submitted preprocessing: $JOB1"

# Submit second job, depends on first
JOB2=$(sbatch --parsable --dependency=afterok:$JOB1 analyze.sh)
echo "Submitted analysis: $JOB2"

# Submit third job, depends on second
JOB3=$(sbatch --parsable --dependency=afterok:$JOB2 postprocess.sh)
echo "Submitted postprocessing: $JOB3"

Fan-out, fan-in pattern

#!/bin/bash
# submit_fanout.sh - Run multiple jobs, then merge results

# Submit parallel processing jobs
JOBS=""
for i in $(seq 1 10); do
    JOB=$(sbatch --parsable process_chunk.sh $i)
    JOBS="${JOBS}:${JOB}"
done
# Remove leading colon
JOBS=${JOBS#:}

echo "Submitted processing jobs: $JOBS"

# Submit merge job that waits for all processing jobs
MERGE_JOB=$(sbatch --parsable --dependency=afterok:$JOBS merge_results.sh)
echo "Submitted merge job: $MERGE_JOB"

Dependency types

OptionMeaning
--dependency=afterok:JOBIDRun after JOBID completes successfully
--dependency=afterany:JOBIDRun after JOBID completes (success or failure)
--dependency=afternotok:JOBIDRun only if JOBID fails
--dependency=after:JOBIDRun after JOBID starts
--dependency=singletonRun only one job with this name at a time

Reading Parameters from Files

For large parameter sets, read from a configuration file:

CSV parameter file

Create parameters.csv:

name,temperature,pressure,iterations
run1,100,1.0,1000
run2,200,1.5,2000
run3,300,2.0,3000

Submit jobs from CSV:

#!/bin/bash
# submit_from_csv.sh

# Skip header line, read each row
tail -n +2 parameters.csv | while IFS=, read -r name temp pressure iters; do
    sbatch --job-name="$name" \
           --output="logs/${name}.out" \
           --export=ALL,TEMP=$temp,PRESSURE=$pressure,ITERATIONS=$iters \
           simulation.sh
done

Python with CSV

#!/usr/bin/env python3
import csv
import subprocess

with open('parameters.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        cmd = [
            'sbatch',
            f'--job-name={row["name"]}',
            f'--output=logs/{row["name"]}.out',
            f'--export=ALL,TEMP={row["temperature"]},PRESSURE={row["pressure"]}',
            'simulation.sh'
        ]
        result = subprocess.run(cmd, capture_output=True, text=True)
        print(f'{row["name"]}: {result.stdout.strip()}')

Conditional Submissions

Submit jobs based on conditions:

#!/bin/bash
# submit_missing.sh - Only submit jobs for missing output files

for input in data/*.dat; do
    base=$(basename "$input" .dat)
    output="results/${base}_result.dat"

    if [ ! -f "$output" ]; then
        echo "Submitting job for $base (output missing)"
        sbatch --job-name="$base" process.sh "$input"
    else
        echo "Skipping $base (output exists)"
    fi
done

Tracking Submitted Jobs

Keep a log of submitted jobs:

#!/bin/bash
# submit_with_log.sh

LOGFILE="submission_log_$(date +%Y%m%d_%H%M%S).txt"
echo "Submission started: $(date)" > "$LOGFILE"

for file in data/*.csv; do
    JOBID=$(sbatch --parsable process.sh "$file")
    echo "$JOBID,$file,$(date +%Y-%m-%d_%H:%M:%S)" >> "$LOGFILE"
done

echo "Submission complete: $(date)" >> "$LOGFILE"
echo "Jobs logged to $LOGFILE"

Rate Limiting

Avoid overwhelming the scheduler with too many submissions at once:

#!/bin/bash
# submit_with_delay.sh

for file in data/*.csv; do
    sbatch process.sh "$file"
    sleep 0.5  # Half-second delay between submissions
done

For very large submissions (1000+ jobs), consider using array jobs with a concurrency limit (--array=1-1000%50) instead.

Best Practices

  • Create directories first: Ensure log and output directories exist before submitting
    mkdir -p logs results
    ./submit_all.sh
    
  • Test with one job: Verify your script works before submitting hundreds of jobs
  • Use --parsable: When capturing job IDs, use sbatch --parsable for clean output
  • Quote variables: Always quote file paths and parameters to handle spaces correctly
  • Prefer array jobs: For simple parameter sweeps, array jobs are more efficient than scripted loops
  • Check queue limits: Be aware of QOS limits on concurrent jobs
  • Keep submission scripts: Save your submission scripts for reproducibility

Further Resources