High Performance Computing | Slurm Array Jobs

What Are Array Jobs?

Array jobs let you submit many similar jobs with a single script. Each array task runs independently with a unique index that you can use to process different input files, parameters, or data subsets.

Use array jobs when you need to:

Process many input files with the same program
Run parameter sweeps
Perform Monte Carlo simulations
Execute embarrassingly parallel workloads

Basic Syntax

#!/bin/bash
#SBATCH --job-name=my_array
#SBATCH --output=output_%A_%a.out    # %A = array job ID, %a = task index
#SBATCH --error=error_%A_%a.err
#SBATCH --array=1-100                # Create 100 tasks with indices 1-100
#SBATCH --ntasks=1
#SBATCH --time=01:00:00

# $SLURM_ARRAY_TASK_ID contains the current task index (1-100)
echo "Processing task $SLURM_ARRAY_TASK_ID"

./myprogram input_${SLURM_ARRAY_TASK_ID}.dat

Array Index Patterns

Pattern	Indices Created	Use Case
`--array=1-100`	1, 2, 3, ... 100	Simple sequential range
`--array=0-99`	0, 1, 2, ... 99	Zero-based indexing
`--array=1,3,5,7`	1, 3, 5, 7	Specific values
`--array=1-100:2`	1, 3, 5, ... 99	Step of 2 (odd numbers)
`--array=0-100:10`	0, 10, 20, ... 100	Step of 10
`--array=1-1000%50`	1-1000, max 50 running	Limit concurrent tasks

Environment Variables

Variable	Description	Example Value
`$SLURM_ARRAY_JOB_ID`	Main array job ID	123456
`$SLURM_ARRAY_TASK_ID`	Current task index	42
`$SLURM_ARRAY_TASK_COUNT`	Total number of tasks	100
`$SLURM_ARRAY_TASK_MIN`	Minimum task index	1
`$SLURM_ARRAY_TASK_MAX`	Maximum task index	100

Output File Naming

Use these placeholders in --output and --error:

%A - Array job ID (same for all tasks)
%a - Array task index (unique per task)
%j - Individual job ID (unique per task)

#SBATCH --output=results/job_%A_task_%a.out
#SBATCH --error=results/job_%A_task_%a.err

Example: Processing Multiple Input Files

#!/bin/bash
#SBATCH --job-name=process_files
#SBATCH --output=logs/process_%A_%a.out
#SBATCH --array=1-50
#SBATCH --ntasks=1
#SBATCH --time=02:00:00

# Process file corresponding to this task index
INPUT_FILE="data/sample_${SLURM_ARRAY_TASK_ID}.csv"
OUTPUT_FILE="results/output_${SLURM_ARRAY_TASK_ID}.csv"

./analyze.py --input $INPUT_FILE --output $OUTPUT_FILE

Example: Parameter Sweep

#!/bin/bash
#SBATCH --job-name=param_sweep
#SBATCH --output=sweep_%A_%a.out
#SBATCH --array=0-99
#SBATCH --ntasks=1
#SBATCH --time=01:00:00

# Define parameter values
TEMPS=(100 200 300 400 500 600 700 800 900 1000)
PRESSURES=(1 2 3 4 5 6 7 8 9 10)

# Calculate which temp and pressure to use
TEMP_IDX=$((SLURM_ARRAY_TASK_ID / 10))
PRESS_IDX=$((SLURM_ARRAY_TASK_ID % 10))

TEMP=${TEMPS[$TEMP_IDX]}
PRESSURE=${PRESSURES[$PRESS_IDX]}

echo "Running simulation: T=$TEMP, P=$PRESSURE"
./simulate --temp $TEMP --pressure $PRESSURE --output result_${TEMP}_${PRESSURE}.dat

Example: Using a File List

#!/bin/bash
#SBATCH --job-name=filelist
#SBATCH --output=logs/%A_%a.out
#SBATCH --array=1-100
#SBATCH --ntasks=1
#SBATCH --time=01:00:00

# Read the Nth line from a file list
INPUT_FILE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" file_list.txt)

echo "Processing: $INPUT_FILE"
./process.py "$INPUT_FILE"

Create file_list.txt with one filename per line:

$ ls data/*.dat > file_list.txt
$ wc -l file_list.txt    # Check how many files
100 file_list.txt

Limiting Concurrent Tasks

Use %N to limit how many array tasks run simultaneously:

#SBATCH --array=1-1000%50    # Max 50 tasks running at once

This is useful when:

Tasks share a limited resource (database, license server)
You don't want to dominate the queue
Tasks write to shared storage and you want to limit I/O

Managing Array Jobs

Check status

# View all tasks
squeue -j JOBID

# View specific task
squeue -j JOBID_42

Cancel array jobs

# Cancel entire array
scancel JOBID

# Cancel specific task
scancel JOBID_42

# Cancel range of tasks
scancel JOBID_[1-50]

Hold/release tasks

# Hold remaining tasks
scontrol hold JOBID

# Release held tasks
scontrol release JOBID

Rerunning Failed Tasks

If some tasks fail, you can rerun only those tasks:

# Check which tasks failed
sacct -j JOBID --format=JobID,State | grep FAILED

# Resubmit only failed tasks
#SBATCH --array=5,17,42,89    # List only the failed indices

Combining with Job Dependencies

Run a job after all array tasks complete:

# Submit array job
$ sbatch array_job.sh
Submitted batch job 123456

# Submit post-processing job that waits for all tasks
$ sbatch --dependency=afterok:123456 postprocess.sh

Best Practices

Create output directories first: Array jobs may fail if output directories don't exist
```
mkdir -p logs results
sbatch array_job.sh
```
Use meaningful output filenames: Include both %A and %a to identify tasks
Test with small arrays first: Before submitting --array=1-10000, test with --array=1-5
Limit concurrent tasks: Use %N if tasks stress shared resources

Handle missing inputs gracefully:

INPUT_FILE="data/input_${SLURM_ARRAY_TASK_ID}.dat"
if [ ! -f "$INPUT_FILE" ]; then
    echo "Input file not found: $INPUT_FILE"
    exit 1
fi

Slurm Array Jobs