Submitting Parallel Jobs with Slurm
Please examine the runtime behavior of your applications.
Running with incorrect Slurm specifications can result in violating the Acceptable Use Policy and you may be asked to terminate your jobs.
Read the documentation to determine the expected behavior of an application, then confirm the behavior with a short test.
General Guidelines
- Serial jobs: Use --ntasks=1. Do not request multiple cores for serial code.
- Memory intensive applications: Specify the maximum memory required with --mem or --mem-per-cpu. See estimating memory requirements.
- Threaded applications (OpenMP): Use --ntasks=1 --cpus-per-task=N where N is the number of threads. Set OMP_NUM_THREADS in your script:
#SBATCH --ntasks=1 #SBATCH --cpus-per-task=8 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./my_threaded_program
- MPI applications: Use --ntasks=N where N is the number of MPI ranks. Use srun to launch:
#SBATCH --ntasks=32 srun ./my_mpi_program
- Shared-memory parallel (single node): Add --nodes=1 to ensure all tasks run on the same node:
#SBATCH --nodes=1 #SBATCH --ntasks=16
- Hybrid MPI+OpenMP: Request MPI tasks with --ntasks and threads per task with --cpus-per-task. Use --exclusive to avoid overloading nodes:
#SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=8 #SBATCH --exclusive export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun ./my_hybrid_program
See hybrid jobs guide for details. - Automatic threading: Some applications automatically use all available cores. You must verify the threading behavior and either:
- Set environment variables to limit threads (e.g., OMP_NUM_THREADS, MKL_NUM_THREADS)
- Request enough cores to match the threading
MPI Compilers and Libraries
Two MPI environments are available for compiling and running parallel applications:
| Environment | Compiler | MPI Library | Module |
|---|---|---|---|
| GNU + OpenMPI | GCC 11.5 | OpenMPI 4.1.8 | openmpi-gcc/openmpi4.1.8-gcc11.5.0-slurm |
| Intel + Intel MPI | Intel 2025.3 | Intel MPI | PrgEnv-intel/2025.3-slurm |
| NVIDIA HPC SDK | NVHPC 26.1 | OpenMPI | PrgEnv-nvidia/26.1-slurm |
GNU + OpenMPI
Use this environment for codes that compile with GCC:
#!/bin/bash #SBATCH --job-name=mpi_gnu #SBATCH --output=mpi.out.%j #SBATCH --ntasks=32 #SBATCH --time=02:00:00 module load openmpi-gcc/openmpi4.1.8-gcc11.5.0-slurm srun ./my_mpi_program
Compile with mpicc (C), mpicxx (C++), or mpif90 (Fortran):
module load openmpi-gcc/openmpi4.1.8-gcc11.5.0-slurm mpicc -O2 -o my_mpi_program my_mpi_program.c
Intel + Intel MPI
Use this environment for codes that benefit from Intel optimizations or require Intel compilers:
#!/bin/bash #SBATCH --job-name=mpi_intel #SBATCH --output=mpi.out.%j #SBATCH --ntasks=32 #SBATCH --time=02:00:00 module load PrgEnv-intel/2025.3-slurm srun ./my_mpi_program
Compile with mpiicc (C), mpiicpc (C++), or mpiifort (Fortran):
module load PrgEnv-intel/2025.3-slurm mpiicc -O2 -o my_mpi_program my_mpi_program.c
NVIDIA HPC SDK
Use this environment for GPU-accelerated codes or codes using NVIDIA compilers:
#!/bin/bash #SBATCH --job-name=mpi_nvidia #SBATCH --output=mpi.out.%j #SBATCH --partition=gpu #SBATCH --gres=gpu:a100:2 #SBATCH --ntasks=2 #SBATCH --time=02:00:00 module load PrgEnv-nvidia/26.1-slurm srun ./my_gpu_mpi_program
Compile with nvc (C), nvc++ (C++), or nvfortran (Fortran). For MPI, use the MPI wrappers:
module load PrgEnv-nvidia/26.1-slurm mpicc -O2 -o my_mpi_program my_mpi_program.c
Choosing an MPI Environment
- GNU + OpenMPI: Good default choice; widely compatible; open source
- Intel + Intel MPI: Often faster on Intel processors; includes MKL math library; better for codes using Intel-specific optimizations
- NVIDIA HPC SDK: Best for GPU-accelerated MPI codes; includes CUDA-aware MPI; supports OpenACC and CUDA Fortran
Important: Use the same module for compiling and running. Mixing environments causes errors.
How to Test Your Application
1. Start an interactive session
salloc --ntasks=4 --nodes=1 --time=00:30:00
2. Run your application
./my_program &
3. Check CPU usage with top or htop
top -u $USER
Look at the %CPU column. Values over 100% indicate multiple threads. A 4-thread program shows ~400%.
4. Verify the thread/process count
ps -T -p $(pgrep -u $USER my_program) | wc -l
Common Mistakes
| Mistake | Problem | Solution |
|---|---|---|
| Requesting multiple cores for serial code | Wastes resources, delays scheduling | Use --ntasks=1 |
| Not constraining to single node for shared-memory code | Tasks may be split across nodes, causing failure or poor performance | Add --nodes=1 |
| Requesting too few cores for auto-threading applications | Overloads node, affects other users | Set thread limits or request matching cores |
| Not using srun for MPI | May not properly launch across nodes | Use srun ./program instead of ./program |
| Hybrid job without --exclusive | Node oversubscription, poor performance | Add --exclusive for hybrid jobs |