Running Jobs with Slurm
Learn how to create, submit, and monitor jobs using the Slurm workload manager.
Submit batch jobs to run applications on compute nodes
The cluster uses the Slurm workload manager for scheduling jobs on compute nodes. Running applications directly on login nodes is not permitted. Users must submit a batch script or request an interactive session via Slurm.
A batch script is a text file containing resource requirements (cores, time, memory) and the commands to run your application. Slurm uses directives beginning with #SBATCH to specify these requirements.
Quick Links
- Examples and templates
- Available resources
- Advanced
- Migrating from LSF?
Step 1: Create batch script
Serial job
The following batch script run_mycode.sh runs a serial application mycode.exe:
#!/bin/bash #SBATCH --job-name=mycode #SBATCH --output=stdout.%j #SBATCH --error=stderr.%j #SBATCH --ntasks=1 #SBATCH --time=02:00:00 ./mycode.exe
The #SBATCH directives specify job parameters:
- --ntasks=1 requests one CPU core
- --time=02:00:00 sets a 2-hour time limit
- --job-name sets the job name displayed by squeue
- --output and --error specify where stdout and stderr are written (%j is replaced by the job ID)
Parallel job
→ Read this note before submitting parallel jobs.
For a parallel application using 4 cores on a single node:
#!/bin/bash #SBATCH --job-name=mycode #SBATCH --output=stdout.%j #SBATCH --error=stderr.%j #SBATCH --nodes=1 #SBATCH --ntasks=4 #SBATCH --time=02:00:00 ./my_parallel_code.exe
Using --nodes=1 ensures all 4 tasks run on the same node. For MPI applications, use srun to launch:
srun ./my_mpi_code.exe
Advanced options
Step 2: Submit job
Batch job
Submit your batch script using sbatch:
sbatch run_mycode.sh
Slurm returns a job ID that you can use to monitor the job.
Interactive job
For GUI applications requiring display, use an HPC-VCL node.
For testing and debugging, start a short interactive session:
Production jobs should always use batch scripts. For testing, request an interactive session using salloc:
salloc --ntasks=4 --nodes=1 --time=00:30:00
This allocates 4 cores on one node for 30 minutes. Once allocated, you'll have a shell on a compute node.
For a serial job with exclusive node access:
salloc --ntasks=1 --exclusive --time=00:30:00
Interactive sessions should be kept brief. Idle sessions may be terminated per the Acceptable Use Policy.
Step 3: Monitor job
Job status
Use squeue to check your jobs:
$ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST 12345 standard mycode unityID R 0:45 1 c001n01
Job states (ST): PD=pending, R=running, CG=completing.
For detailed job information:
squeue -j JOBID -l
To see why a pending job is waiting:
squeue -j JOBID -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R"
Cancel a job
Use scancel with the job ID:
scancel JOBID
To cancel all your jobs:
scancel -u $USER
Partition and node status
View partition information with sinfo:
$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST standard* up 4-00:00:00 50 idle c[001-050] gpu up 4-00:00:00 10 mix gpu[01-10]
The cluster status page provides detailed node availability.
Sample batch scripts
MPI job
#!/bin/bash #SBATCH --job-name=hydro #SBATCH --output=hydro.out.%j #SBATCH --error=hydro.err.%j #SBATCH --ntasks=32 #SBATCH --time=02:00:00 #SBATCH --partition=compute module load PrgEnv-intel srun ./hydro.exe
Runs an MPI code using 32 tasks for up to 2 hours. The srun command launches the MPI tasks across allocated nodes.
Hybrid MPI+OpenMP job
#!/bin/bash #SBATCH --job-name=chemtest #SBATCH --output=chemtest.out.%j #SBATCH --error=chemtest.err.%j #SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=8 #SBATCH --time=02:00:00 #SBATCH --exclusive module load openmpi-gcc export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun ./chemtest.exe
Runs a hybrid code with 8 MPI tasks (4 per node) and 8 OpenMP threads per task. Uses --exclusive for dedicated node access. See hybrid jobs guide.
GPU job
#!/bin/bash #SBATCH --job-name=nnetworks #SBATCH --output=out.%j #SBATCH --error=err.%j #SBATCH --ntasks=1 #SBATCH --time=00:30:00 #SBATCH --partition=gpu #SBATCH --gres=gpu:a100:1 module load cuda ./nnetworks.exe
Runs a CUDA application using one A100 GPU. GPU type is required in the --gres directive (e.g., --gres=gpu:a100:1). Available types: a10, a30, a100, gtx1080, h100, h200, l40, l40s, p100, rtx_2080. See the GPU jobs guide and CUDA software page.
Command quick reference
Common commands
| Command | Description |
|---|---|
| sbatch script.sh | Submit batch job |
| squeue -u $USER | Show your jobs |
| scancel JOBID | Cancel a job |
| sinfo | Show partition status |
| salloc | Request interactive allocation |
| srun | Run parallel tasks |
| sacct -j JOBID | Job accounting info |
| seff JOBID | Job efficiency report |
Common directives
| Directive | Description |
|---|---|
| --job-name=NAME | Job name |
| --output=FILE | Stdout file (%j=jobid) |
| --error=FILE | Stderr file |
| --ntasks=N | Number of tasks |
| --nodes=N | Number of nodes |
| --time=HH:MM:SS | Time limit |
| --partition=NAME | Partition (queue) |
| --mem=SIZE | Memory per node |
| --gres=gpu:N | Request N GPUs |