Testing Phase: Slurm is currently being tested with limited hardware. See Deployment Status for current test resources and known limitations.

Submit batch jobs to run applications on compute nodes

The cluster uses the Slurm workload manager for scheduling jobs on compute nodes. Running applications directly on login nodes is not permitted. Users must submit a batch script or request an interactive session via Slurm.

A batch script is a text file containing resource requirements (cores, time, memory) and the commands to run your application. Slurm uses directives beginning with #SBATCH to specify these requirements.

Quick Links

Step 1: Create batch script

Serial job

The following batch script run_mycode.sh runs a serial application mycode.exe:

#!/bin/bash
#SBATCH --job-name=mycode
#SBATCH --output=stdout.%j
#SBATCH --error=stderr.%j
#SBATCH --ntasks=1
#SBATCH --time=02:00:00

./mycode.exe

The #SBATCH directives specify job parameters:

  • --ntasks=1 requests one CPU core
  • --time=02:00:00 sets a 2-hour time limit
  • --job-name sets the job name displayed by squeue
  • --output and --error specify where stdout and stderr are written (%j is replaced by the job ID)

Parallel job

→ Read this note before submitting parallel jobs.

For a parallel application using 4 cores on a single node:

#!/bin/bash
#SBATCH --job-name=mycode
#SBATCH --output=stdout.%j
#SBATCH --error=stderr.%j
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=02:00:00

./my_parallel_code.exe

Using --nodes=1 ensures all 4 tasks run on the same node. For MPI applications, use srun to launch:

srun ./my_mpi_code.exe

Advanced options

Step 2: Submit job

Batch job

Submit your batch script using sbatch:

sbatch run_mycode.sh

Slurm returns a job ID that you can use to monitor the job.

Interactive job

For GUI applications requiring display, use an HPC-VCL node.

For testing and debugging, start a short interactive session:

Production jobs should always use batch scripts. For testing, request an interactive session using salloc:

salloc --ntasks=4 --nodes=1 --time=00:30:00

This allocates 4 cores on one node for 30 minutes. Once allocated, you'll have a shell on a compute node.

For a serial job with exclusive node access:

salloc --ntasks=1 --exclusive --time=00:30:00

Interactive sessions should be kept brief. Idle sessions may be terminated per the Acceptable Use Policy.

Step 3: Monitor job

Job status

Use squeue to check your jobs:

$ squeue -u $USER
JOBID  PARTITION  NAME    USER    ST  TIME  NODES  NODELIST
12345  standard   mycode  unityID  R  0:45      1  c001n01

Job states (ST): PD=pending, R=running, CG=completing.

For detailed job information:

squeue -j JOBID -l

To see why a pending job is waiting:

squeue -j JOBID -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R"

Cancel a job

Use scancel with the job ID:

scancel JOBID

To cancel all your jobs:

scancel -u $USER

Partition and node status

View partition information with sinfo:

$ sinfo
PARTITION  AVAIL  TIMELIMIT  NODES  STATE  NODELIST
standard*     up   4-00:00:00    50  idle   c[001-050]
gpu           up   4-00:00:00    10  mix    gpu[01-10]

The cluster status page provides detailed node availability.

MPI job

#!/bin/bash
#SBATCH --job-name=hydro
#SBATCH --output=hydro.out.%j
#SBATCH --error=hydro.err.%j
#SBATCH --ntasks=32
#SBATCH --time=02:00:00
#SBATCH --partition=compute

module load PrgEnv-intel
srun ./hydro.exe

Runs an MPI code using 32 tasks for up to 2 hours. The srun command launches the MPI tasks across allocated nodes.

Hybrid MPI+OpenMP job

#!/bin/bash
#SBATCH --job-name=chemtest
#SBATCH --output=chemtest.out.%j
#SBATCH --error=chemtest.err.%j
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --exclusive

module load openmpi-gcc
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun ./chemtest.exe

Runs a hybrid code with 8 MPI tasks (4 per node) and 8 OpenMP threads per task. Uses --exclusive for dedicated node access. See hybrid jobs guide.

GPU job

#!/bin/bash
#SBATCH --job-name=nnetworks
#SBATCH --output=out.%j
#SBATCH --error=err.%j
#SBATCH --ntasks=1
#SBATCH --time=00:30:00
#SBATCH --partition=gpu
#SBATCH --gres=gpu:a100:1

module load cuda
./nnetworks.exe

Runs a CUDA application using one A100 GPU. GPU type is required in the --gres directive (e.g., --gres=gpu:a100:1). Available types: a10, a30, a100, gtx1080, h100, h200, l40, l40s, p100, rtx_2080. See the GPU jobs guide and CUDA software page.

Common commands

CommandDescription
sbatch script.shSubmit batch job
squeue -u $USERShow your jobs
scancel JOBIDCancel a job
sinfoShow partition status
sallocRequest interactive allocation
srunRun parallel tasks
sacct -j JOBIDJob accounting info
seff JOBIDJob efficiency report

Common directives

DirectiveDescription
--job-name=NAMEJob name
--output=FILEStdout file (%j=jobid)
--error=FILEStderr file
--ntasks=NNumber of tasks
--nodes=NNumber of nodes
--time=HH:MM:SSTime limit
--partition=NAMEPartition (queue)
--mem=SIZEMemory per node
--gres=gpu:NRequest N GPUs