Slurm Array Jobs
How to submit many similar jobs efficiently using Slurm job arrays.
What Are Array Jobs?
Array jobs let you submit many similar jobs with a single script. Each array task runs independently with a unique index that you can use to process different input files, parameters, or data subsets.
Use array jobs when you need to:
- Process many input files with the same program
- Run parameter sweeps
- Perform Monte Carlo simulations
- Execute embarrassingly parallel workloads
Basic Syntax
#!/bin/bash
#SBATCH --job-name=my_array
#SBATCH --output=output_%A_%a.out # %A = array job ID, %a = task index
#SBATCH --error=error_%A_%a.err
#SBATCH --array=1-100 # Create 100 tasks with indices 1-100
#SBATCH --ntasks=1
#SBATCH --time=01:00:00
# $SLURM_ARRAY_TASK_ID contains the current task index (1-100)
echo "Processing task $SLURM_ARRAY_TASK_ID"
./myprogram input_${SLURM_ARRAY_TASK_ID}.dat
Array Index Patterns
| Pattern | Indices Created | Use Case |
|---|---|---|
| --array=1-100 | 1, 2, 3, ... 100 | Simple sequential range |
| --array=0-99 | 0, 1, 2, ... 99 | Zero-based indexing |
| --array=1,3,5,7 | 1, 3, 5, 7 | Specific values |
| --array=1-100:2 | 1, 3, 5, ... 99 | Step of 2 (odd numbers) |
| --array=0-100:10 | 0, 10, 20, ... 100 | Step of 10 |
| --array=1-1000%50 | 1-1000, max 50 running | Limit concurrent tasks |
Environment Variables
| Variable | Description | Example Value |
|---|---|---|
| $SLURM_ARRAY_JOB_ID | Main array job ID | 123456 |
| $SLURM_ARRAY_TASK_ID | Current task index | 42 |
| $SLURM_ARRAY_TASK_COUNT | Total number of tasks | 100 |
| $SLURM_ARRAY_TASK_MIN | Minimum task index | 1 |
| $SLURM_ARRAY_TASK_MAX | Maximum task index | 100 |
Output File Naming
Use these placeholders in --output and --error:
- %A - Array job ID (same for all tasks)
- %a - Array task index (unique per task)
- %j - Individual job ID (unique per task)
#SBATCH --output=results/job_%A_task_%a.out #SBATCH --error=results/job_%A_task_%a.err
Example: Processing Multiple Input Files
#!/bin/bash
#SBATCH --job-name=process_files
#SBATCH --output=logs/process_%A_%a.out
#SBATCH --array=1-50
#SBATCH --ntasks=1
#SBATCH --time=02:00:00
# Process file corresponding to this task index
INPUT_FILE="data/sample_${SLURM_ARRAY_TASK_ID}.csv"
OUTPUT_FILE="results/output_${SLURM_ARRAY_TASK_ID}.csv"
./analyze.py --input $INPUT_FILE --output $OUTPUT_FILE
Example: Parameter Sweep
#!/bin/bash
#SBATCH --job-name=param_sweep
#SBATCH --output=sweep_%A_%a.out
#SBATCH --array=0-99
#SBATCH --ntasks=1
#SBATCH --time=01:00:00
# Define parameter values
TEMPS=(100 200 300 400 500 600 700 800 900 1000)
PRESSURES=(1 2 3 4 5 6 7 8 9 10)
# Calculate which temp and pressure to use
TEMP_IDX=$((SLURM_ARRAY_TASK_ID / 10))
PRESS_IDX=$((SLURM_ARRAY_TASK_ID % 10))
TEMP=${TEMPS[$TEMP_IDX]}
PRESSURE=${PRESSURES[$PRESS_IDX]}
echo "Running simulation: T=$TEMP, P=$PRESSURE"
./simulate --temp $TEMP --pressure $PRESSURE --output result_${TEMP}_${PRESSURE}.dat
Example: Using a File List
#!/bin/bash
#SBATCH --job-name=filelist
#SBATCH --output=logs/%A_%a.out
#SBATCH --array=1-100
#SBATCH --ntasks=1
#SBATCH --time=01:00:00
# Read the Nth line from a file list
INPUT_FILE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" file_list.txt)
echo "Processing: $INPUT_FILE"
./process.py "$INPUT_FILE"
Create file_list.txt with one filename per line:
$ ls data/*.dat > file_list.txt $ wc -l file_list.txt # Check how many files 100 file_list.txt
Limiting Concurrent Tasks
Use %N to limit how many array tasks run simultaneously:
#SBATCH --array=1-1000%50 # Max 50 tasks running at once
This is useful when:
- Tasks share a limited resource (database, license server)
- You don't want to dominate the queue
- Tasks write to shared storage and you want to limit I/O
Managing Array Jobs
Check status
# View all tasks squeue -j JOBID # View specific task squeue -j JOBID_42
Cancel array jobs
# Cancel entire array scancel JOBID # Cancel specific task scancel JOBID_42 # Cancel range of tasks scancel JOBID_[1-50]
Hold/release tasks
# Hold remaining tasks scontrol hold JOBID # Release held tasks scontrol release JOBID
Rerunning Failed Tasks
If some tasks fail, you can rerun only those tasks:
# Check which tasks failed sacct -j JOBID --format=JobID,State | grep FAILED # Resubmit only failed tasks #SBATCH --array=5,17,42,89 # List only the failed indices
Combining with Job Dependencies
Run a job after all array tasks complete:
# Submit array job $ sbatch array_job.sh Submitted batch job 123456 # Submit post-processing job that waits for all tasks $ sbatch --dependency=afterok:123456 postprocess.sh
Best Practices
- Create output directories first: Array jobs may fail if output directories don't exist
mkdir -p logs results sbatch array_job.sh
- Use meaningful output filenames: Include both %A and %a to identify tasks
- Test with small arrays first: Before submitting --array=1-10000, test with --array=1-5
- Limit concurrent tasks: Use %N if tasks stress shared resources
- Handle missing inputs gracefully:
INPUT_FILE="data/input_${SLURM_ARRAY_TASK_ID}.dat" if [ ! -f "$INPUT_FILE" ]; then echo "Input file not found: $INPUT_FILE" exit 1 fi