Slurm Partitions and Resources
Available partitions, Quality of Service (QOS), and compute resources.
Partitions
Partitions group nodes by hardware type and access level. Specify a partition with #SBATCH --partition=NAME.
| Partition | Description | Access | Default QOS |
|---|---|---|---|
| compute | Standard CPU compute nodes | All users | normal |
| compute_partners | Extended CPU pool including partner nodes | Partner projects | p_<group> |
| gpu | Standard GPU nodes | All users | gpu |
| gpu_partners | Extended GPU pool including partner nodes | Partner projects | p_<group>_gpu |
If no partition is specified, the default partition (compute) is used. Partner projects should use compute_partners or gpu_partners to access their dedicated resources; the partition's default QOS is the partner's own p_<group> (or p_<group>_gpu) so jobs land on the partner allocation automatically — see Running Partner Jobs.
Partition Relationships
The partner partitions include many nodes from the standard partitions plus additional partner-contributed nodes:
Quality of Service (QOS)
QOS controls job priority and resource limits. Specify with #SBATCH --qos=NAME. Each partition has a default QOS.
| QOS | Priority | Max Wall Time | Description |
|---|---|---|---|
| normal | Standard | 4 days | Standard CPU jobs on compute partition |
| long | Standard | 10 days | Long-running CPU jobs on compute partition that need more than the standard 4-day limit |
| gpu | Standard | 4 days | Standard GPU jobs on gpu partition |
| short | Higher | 2 hours | Short jobs on compute_partners; access to idle partner nodes |
| short_gpu | Higher | 2 hours | Short jobs on gpu_partners; access to idle partner GPUs |
| p_<group> | Highest | Varies | Per-partner CPU QOS, allowed on compute_partners. Default for members of partner project group when submitting to that partition. |
| p_<group>_gpu | Highest | Varies | Per-partner GPU QOS, allowed on gpu_partners. Default for members of partner project group when submitting to that partition. |
The long QOS is available to all users on the compute partition for jobs that need more than the standard 4-day wall time, up to 10 days. Request it with #SBATCH --qos=long. Because long jobs hold resources for an extended period, set an accurate --time and use checkpointing where possible.
Each partner project gets its own p_<group> and (when applicable) p_<group>_gpu QOS. Limits are sized by the partner's CPU and GPU contributions to the cluster.
Partition and QOS Availability
Each partition allows specific QOS values. Jobs must use a QOS that is allowed in the requested partition:
| Partition | normal | long | gpu | short | short_gpu | p_<group> | p_<group>_gpu |
|---|---|---|---|---|---|---|---|
| compute | Yes | Yes | - | - | - | - | - |
| compute_partners | - | - | - | Yes | - | Yes | - |
| gpu | - | - | Yes | - | - | - | - |
| gpu_partners | - | - | - | - | Yes | - | Yes |
Note: The short and short_gpu QOS allow all users to access idle partner resources for jobs under 2 hours. The per-partner p_<group> and p_<group>_gpu entries are project-specific — only members of partner project group can use them.
See Job Priority and Fairshare for details on how QOS affects scheduling priority.
Partner Resources
Research groups that have purchased nodes for the cluster have access to the partner partitions (compute_partners / gpu_partners) and a per-project QOS (p_<group> / p_<group>_gpu) with elevated priority on partner-contributed hardware. Each partner project's CPU and GPU contributions raise its priority on the corresponding side of the account tree independently.
- Running Partner Jobs — how to submit, default QOS routing, and the dual _cpu/_gpu account model.
- Information about becoming an HPC partner.
Compute Node Types
The cluster contains several generations of compute hardware. You can request specific node types using --constraint.
CPU Nodes
| Constraint | CPU Model | Cores/Node | Memory |
|---|---|---|---|
| genoa | AMD EPYC 4th Gen | 192 | 768 GB |
| sapphirerapids | Intel Xeon 4th Gen | 64 | 256-512 GB |
| icelake | Intel Xeon 3rd Gen | 64 | 256 GB |
| cascadelake | Intel Xeon 2nd Gen | 32 | 192 GB |
| skylake | Intel Xeon Scalable | 32 | 192 GB |
| broadwell | Intel Xeon E5 v4 | 24 | 128 GB |
| haswell | Intel Xeon E5 v3 | 20 | 128 GB |
Example: Request Sapphire Rapids nodes:
#SBATCH --constraint=sapphirerapids
GPU Nodes
| GPU Type | GPU Memory | GPUs/Node | Request |
|---|---|---|---|
| NVIDIA H200 | 141 GB | 4 | --gres=gpu:h200:N |
| NVIDIA H100 | 80 GB | 4 | --gres=gpu:h100:N |
| NVIDIA A100 | 40/80 GB | 4 | --gres=gpu:a100:N |
| NVIDIA L40S | 48 GB | 4 | --gres=gpu:l40s:N |
| NVIDIA L40 | 48 GB | 4 | --gres=gpu:l40:N |
| NVIDIA A30 | 24 GB | 2 | --gres=gpu:a30:N |
| NVIDIA A10 | 24 GB | 2 | --gres=gpu:a10:N |
| NVIDIA P100 | 16 GB | 2 | --gres=gpu:p100:N |
| NVIDIA RTX 2080 | 8 GB | 4 | --gres=gpu:rtx_2080:N |
| NVIDIA GTX 1080 | 8 GB | 4 | --gres=gpu:gtx1080:N |
Example: Request 2 A100 GPUs:
#SBATCH --partition=gpu #SBATCH --gres=gpu:a100:2
Additional Constraints
Beyond CPU architecture, you can constrain jobs by other node features using --constraint:
| Category | Constraint | Description |
|---|---|---|
| Vendor | intel, amd | CPU vendor |
| GPU Vendor | nvidia | Nodes with NVIDIA GPUs |
| Instruction Set | avx, avx2 | AVX vector instructions |
| Instruction Set | avx512 | AVX-512 (Skylake and newer) |
| Instruction Set | sse4_1, sse4_2 | SSE 4.x instructions |
| Network | ib | InfiniBand interconnect |
Combining Constraints
# AND - require both features #SBATCH --constraint="intel&avx512" # OR - accept either feature #SBATCH --constraint="icelake|sapphirerapids"
Cluster Topology and Network Switches
Compute nodes are connected to each other through a message-passing network built from a set of leaf switches. (This is separate from the HPC private network: each compute node is dual-homed and also has a private connection back to the login nodes, the Slurm controller, and storage. The topology described here is only about the compute-node-to-compute-node message-passing network.) Not every switch on that network is cross-connected: some nodes sit behind Ethernet switches that form isolated “islands,” while others are attached to a high-speed InfiniBand (IB) fabric where every switch can reach every other. A multi-node job whose nodes land on two unconnected Ethernet switches will fail to communicate, so node placement matters for any job that spans more than one node.
The default: --switches=1
To keep multi-node jobs from being split across switches that can't talk to each other, the cluster adds --switches=1 to your submission when you don't specify --switches yourself. This confines every node in the job to a single switch. No maximum wait is attached, so the job stays pending until enough nodes are free on one switch. Single-node jobs are unaffected. This behavior is part of automatic submit-time processing — see Network switch placement on the Job Lifecycle page.
Relaxing the restriction safely
Confining a job to one switch can make it wait longer (or prevent it from starting at all) when it needs more nodes than any single switch provides. You can safely allow a job to span multiple switches only when its nodes are on the InfiniBand fabric, where cross-switch communication works. The safe pattern is to pin the job to IB nodes and then raise the switch count:
# Allow up to 2 switches, but only on the connected InfiniBand fabric #SBATCH --constraint=ib #SBATCH --switches=2
The --constraint=ib guarantees the nodes come from the connected fabric rather than an Ethernet island, and the higher --switches value lets the scheduler draw nodes from more than one IB switch. You can also attach a maximum wait so Slurm gives up on the tighter placement after a while and runs the job anyway:
# Prefer a single switch, but start after 30 min even if that means more #SBATCH --constraint=ib #SBATCH --switches=1@00:30:00
Do not raise --switches without also requesting --constraint=ib: doing so lets the job spread across unconnected Ethernet switches, where the nodes cannot communicate. Single-node jobs never need any of this.
Checking Resource Availability
OIT provides a small set of helper commands on the login nodes — si, sa, and sqos — that summarize the most common availability and account queries in a more readable form than the raw Slurm commands. They take no setup; just run them. The underlying Slurm commands still work and are listed for reference. Add --help to any helper for its full set of options.
Node availability — si
si shows how many nodes are currently free (idle or mixed), grouped by CPU architecture on the compute partitions and by GPU model on the GPU partitions.
si # per-partition summary of free nodes, by architecture / GPU model si -p gpu # restrict to a single partition si --memory # summary grouped by total node memory size instead si --all # include nodes in every state (allocated, drained, down, ...) si --nodes # per-node table: available / allocated / offline / total cores si --nodes --memory # per-node free / allocated / total memory si --nodes --gpus # per-node free / allocated / total GPUs (GPU nodes only)
Equivalent native commands: sinfo, sinfo -N -l, sinfo -p gpu -o "%N %G %t".
Your accounts and QOS — sa and sqos
sa lists your Slurm associations — the accounts you can charge to, the partitions, your default QOS, and the full QOS list for each. sqos answers the more practical question of which QOS you can actually use, on which partitions, and with what limits (wall time, and per-user / per-job CPU, GPU, and memory caps).
sa # your associations (account, partition, default QOS, allowed QOS) sa --tree # your place in the account hierarchy, back to root sa alice # another user's associations sqos # QOS you can submit with, their partitions, and limits sqos -v # also show priority and flags, plus the source associations
Equivalent native command: sacctmgr show assoc user=$USER format=account,qos,maxcpus,maxnodes.
Your fairshare
Fairshare determines your scheduling priority relative to other users — see Job Priority and Fairshare for how it is calculated.
sshare -u $USER
Cluster Status
For a graphical view of node availability, see the cluster status page.
See Also
- Job Priority and Fairshare - How Slurm determines which jobs run first
- Job Lifecycle: Network switch placement - The automatic --switches=1 default
- Running Partner Jobs - For members of HPC Partner projects
- Job Monitoring FAQ - Checking job status, efficiency, and cluster availability
- GPU Jobs - Detailed GPU job submission examples
- Array Jobs - Running many similar jobs efficiently
- Batch Job Templates - Example batch scripts