High Performance Computing | Running Jobs with Slurm

Testing Phase: Slurm is currently being tested with limited hardware. See Deployment Status for current test resources and known limitations.

Submit batch jobs to run applications on compute nodes

The cluster uses the Slurm workload manager for scheduling jobs on compute nodes. Running applications directly on login nodes is not permitted. Users must submit a batch script or request an interactive session via Slurm.

A batch script is a text file containing resource requirements (cores, time, memory) and the commands to run your application. Slurm uses directives beginning with #SBATCH to specify these requirements.

Quick Links

Examples and templates

Available resources

Partitions and QOS

Advanced

Migrating from LSF?

Step 1: Create batch script

Serial job

The following batch script run_mycode.sh runs a serial application mycode.exe:

#!/bin/bash
#SBATCH --job-name=mycode
#SBATCH --output=stdout.%j
#SBATCH --error=stderr.%j
#SBATCH --ntasks=1
#SBATCH --mem=4G
#SBATCH --time=02:00:00

./mycode.exe

The #SBATCH directives specify job parameters:

--ntasks=1 requests one CPU core
--mem=4G requests 4 GB of memory per node
--time=02:00:00 sets a 2-hour time limit
--job-name sets the job name displayed by squeue
--output and --error specify where stdout and stderr are written (%j is replaced by the job ID)

Parallel job

→ Read this note before submitting parallel jobs.

For a parallel application using 4 cores on a single node:

#!/bin/bash
#SBATCH --job-name=mycode
#SBATCH --output=stdout.%j
#SBATCH --error=stderr.%j
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --mem=8G
#SBATCH --time=02:00:00

./my_parallel_code.exe

Using --nodes=1 ensures all 4 tasks run on the same node. For MPI applications, use srun to launch:

srun ./my_mpi_code.exe

Advanced options

Step 2: Submit job

Batch job

Submit your batch script using sbatch:

sbatch run_mycode.sh

Slurm returns a job ID that you can use to monitor the job.

Interactive job

For GUI applications requiring display, use an HPC-VCL node.

Using the HPC-VCL

For testing and debugging, start a short interactive session:

Production jobs should always use batch scripts. For testing, request an interactive session using salloc:

salloc --ntasks=4 --nodes=1 --time=00:30:00

This allocates 4 cores on one node for 30 minutes. Once allocated, you'll have a shell on a compute node.

For a serial job with exclusive node access:

salloc --ntasks=1 --exclusive --time=00:30:00

Interactive sessions should be kept brief. Idle sessions may be terminated per the Acceptable Use Policy.

Step 3: Monitor job

Job status

Use squeue to check your jobs:

$ squeue -u $USER
JOBID  PARTITION  NAME    USER    ST  TIME  NODES  NODELIST
12345  standard   mycode  unityID  R  0:45      1  c001n01

Job states (ST): PD=pending, R=running, CG=completing.

For detailed job information:

squeue -j JOBID -l

To see why a pending job is waiting:

squeue -j JOBID -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R"

Cancel a job

Use scancel with the job ID:

scancel JOBID

To cancel all your jobs:

scancel -u $USER

Partition and node status

View partition information with si (local wrapper for sinfo command):

$ si
Partition........... Available Nodes
compute............. 20 Haswell  34 Broadwell  18 Cascadelake
compute_partners.... 20 Haswell  34 Broadwell  71 Skylake  18 Cascadelake
                     2 Icelake-6326  6 Genoa  1 Turin
gpu................. 1 RTX_2080  2 P100  4 A30  1 A100  1 L40
gpu_partners........ 1 RTX_2080  2 P100  4 A30  2 A10  1 A100  1 L40  1 L40S

The cluster status page provides detailed node availability.

MPI job

#!/bin/bash
#SBATCH --job-name=hydro
#SBATCH --output=hydro.out.%j
#SBATCH --error=hydro.err.%j
#SBATCH --ntasks=32
#SBATCH --mem=16G
#SBATCH --time=02:00:00
#SBATCH --partition=compute

module load PrgEnv-intel
srun ./hydro.exe

Runs an MPI code using 32 tasks for up to 2 hours. The srun command launches the MPI tasks across allocated nodes.

Hybrid MPI+OpenMP job

#!/bin/bash
#SBATCH --job-name=chemtest
#SBATCH --output=chemtest.out.%j
#SBATCH --error=chemtest.err.%j
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --exclusive

module load openmpi-gcc
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun ./chemtest.exe

Runs a hybrid code with 8 MPI tasks (4 per node) and 8 OpenMP threads per task. Uses --exclusive for dedicated node access. See hybrid jobs guide.

GPU job

#!/bin/bash
#SBATCH --job-name=nnetworks
#SBATCH --output=out.%j
#SBATCH --error=err.%j
#SBATCH --ntasks=1
#SBATCH --mem=32G
#SBATCH --time=00:30:00
#SBATCH --partition=gpu
#SBATCH --gres=gpu:a100:1

module load cuda
./nnetworks.exe

Runs a CUDA application using one A100 GPU. GPU type is required in the --gres directive (e.g., --gres=gpu:a100:1). Available types: a10, a30, a100, gtx1080, h100, h200, l40, l40s, p100, rtx_2080. See the GPU jobs guide and CUDA software page.

Common commands

Command	Description
`sbatch script.sh`	Submit batch job
`squeue -u $USER`	Show your jobs
`scancel JOBID`	Cancel a job
`sinfo`	Show partition status
`salloc`	Request interactive allocation
`srun`	Run parallel tasks
`sacct -j JOBID`	Job accounting info
`seff JOBID`	Job efficiency report

Common directives

Directive	Description
`--job-name=NAME`	Job name
`--output=FILE`	Stdout file (%j=jobid)
`--error=FILE`	Stderr file
`--ntasks=N`	Number of tasks
`--nodes=N`	Number of nodes
`--time=HH:MM:SS`	Time limit
`--partition=NAME`	Partition (queue)
`--qos=NAME`	Quality of Service (e.g., `long` for up to 10 days)
`--account=NAME`	Account to charge (defaults to your group)
`--mem=SIZE`	Memory per node
`--constraint=NAME`	Specific Node Feature/Architecture
`--gres=gpu:TYPE:N`	Request N GPUs of TYPE model

Local commands

Command	Description
`si` [`--partition NAME`] [`--nodes`] [`--all`] [`--slurm`] [`--memory`] [`--gpus`]	Per-partition summary of idle/mixed nodes grouped by CPU architecture or GPU model; `--nodes` lists individual nodes with available / allocated / offline / total core counts; `--all` includes nodes in every state; `--slurm` excludes nodes not available to Slurm; `--memory` (with `--nodes`) adds memory columns in GiB; `--gpus` (with `--nodes`) shows GPU counts in place of CPU cores, restricted to GPU nodes
`sa [LOGIN]`	Show Slurm associations (account, partition, default QOS, QOS list) for $USER or LOGIN
`sq --pend` / `sq --run`	Focused squeue views: pending jobs with priority and reason, or running jobs with elapsed time and wall limit
`sqos [LOGIN]`	Show QOS available to $USER or LOGIN, with allowed partitions and wallclock / CPU / GPU / memory / GRES limits

Running Jobs with Slurm