High Performance Computing | Slurm Partitions and Resources

Partitions

Partitions group nodes by hardware type and access level. Specify a partition with #SBATCH --partition=NAME.

Partition	Description	Access	Default QOS
`compute`	Standard CPU compute nodes	All users	normal
`compute_partners`	Extended CPU pool including partner nodes	Partner projects	p_<group>
`gpu`	Standard GPU nodes	All users	gpu
`gpu_partners`	Extended GPU pool including partner nodes	Partner projects	p_<group>_gpu

If no partition is specified, the default partition (compute) is used. Partner projects should use compute_partners or gpu_partners to access their dedicated resources; the partition's default QOS is the partner's own p_<group> (or p_<group>_gpu) so jobs land on the partner allocation automatically — see Running Partner Jobs.

Partition Relationships

The partner partitions include many nodes from the standard partitions plus additional partner-contributed nodes:

Quality of Service (QOS)

QOS controls job priority and resource limits. Specify with #SBATCH --qos=NAME. Each partition has a default QOS.

QOS	Priority	Max Wall Time	Description
`normal`	Standard	4 days	Standard CPU jobs on compute partition
`long`	Standard	10 days	Long-running CPU jobs on compute partition that need more than the standard 4-day limit
`gpu`	Standard	4 days	Standard GPU jobs on gpu partition
`short`	Higher	2 hours	Short jobs on compute_partners; access to idle partner nodes
`short_gpu`	Higher	2 hours	Short jobs on gpu_partners; access to idle partner GPUs
`p_<group>`	Highest	Varies	Per-partner CPU QOS, allowed on `compute_partners`. Default for members of partner project group when submitting to that partition.
`p_<group>_gpu`	Highest	Varies	Per-partner GPU QOS, allowed on `gpu_partners`. Default for members of partner project group when submitting to that partition.

The long QOS is available to all users on the compute partition for jobs that need more than the standard 4-day wall time, up to 10 days. Request it with #SBATCH --qos=long. Because long jobs hold resources for an extended period, set an accurate --time and use checkpointing where possible.

Each partner project gets its own p_<group> and (when applicable) p_<group>_gpu QOS. Limits are sized by the partner's CPU and GPU contributions to the cluster.

Partition and QOS Availability

Each partition allows specific QOS values. Jobs must use a QOS that is allowed in the requested partition:

Partition	`normal`	`long`	`gpu`	`short`	`short_gpu`	`p_<group>`	`p_<group>_gpu`
`compute`	Yes	Yes	-	-	-	-	-
`compute_partners`	-	-	-	Yes	-	Yes	-
`gpu`	-	-	Yes	-	-	-	-
`gpu_partners`	-	-	-	-	Yes	-	Yes

Note: The short and short_gpu QOS allow all users to access idle partner resources for jobs under 2 hours. The per-partner p_<group> and p_<group>_gpu entries are project-specific — only members of partner project group can use them.

See Job Priority and Fairshare for details on how QOS affects scheduling priority.

Partner Resources

Research groups that have purchased nodes for the cluster have access to the partner partitions (compute_partners / gpu_partners) and a per-project QOS (p_<group> / p_<group>_gpu) with elevated priority on partner-contributed hardware. Each partner project's CPU and GPU contributions raise its priority on the corresponding side of the account tree independently.

Running Partner Jobs — how to submit, default QOS routing, and the dual _cpu/_gpu account model.
Information about becoming an HPC partner.

Compute Node Types

The cluster contains several generations of compute hardware. You can request specific node types using --constraint.

CPU Nodes

Constraint	CPU Model	Cores/Node	Memory
`genoa`	AMD EPYC 4th Gen	192	768 GB
`sapphirerapids`	Intel Xeon 4th Gen	64	256-512 GB
`icelake_6326`	Intel Xeon 3rd Gen SP Gold 6326	64	256 GB
`icelake_8358`	Intel Xeon 3rd Gen SP Plantinum 8358	64	256 GB
`cascadelake`	Intel Xeon 2nd Gen SP	32	192 GB
`skylake`	Intel Xeon Scalable Processor	32	192 GB
`broadwell`	Intel Xeon E5 v4	24	128 GB
`haswell`	Intel Xeon E5 v3	20	128 GB

Example: Request Sapphire Rapids nodes:

#SBATCH --constraint=sapphirerapids

GPU Nodes

GPU Type	GPU Memory	GPUs/Node	Request
NVIDIA H200	141 GB	4	`--gres=gpu:h200:N`
NVIDIA H100	80 GB	4	`--gres=gpu:h100:N`
NVIDIA L40S	48 GB	4	`--gres=gpu:l40s:N`
NVIDIA L40	48 GB	4	`--gres=gpu:l40:N`
NVIDIA A100	40 GB	4	`--gres=gpu:a100:N`
NVIDIA A30	24 GB	2	`--gres=gpu:a30:N`
NVIDIA A10	24 GB	2	`--gres=gpu:a10:N`
NVIDIA P100	16 GB	2	`--gres=gpu:p100:N`
NVIDIA RTX 2080	8 GB	4	`--gres=gpu:rtx_2080:N`
NVIDIA GTX 1080	8 GB	4	`--gres=gpu:gtx_1080:N`

Example: Request 2 A100 GPUs:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:a100:2

Additional Constraints

Beyond CPU architecture, you can constrain jobs by other node features using --constraint:

Category	Constraint	Description
Vendor	`intel`, `amd`	CPU vendor
GPU Vendor	`nvidia`	Nodes with NVIDIA GPUs
Instruction Set	`avx`, `avx2`	AVX vector instructions
Instruction Set	`avx512`	AVX-512 (Skylake and newer)
Instruction Set	`sse4_1`, `sse4_2`	SSE 4.x instructions
Network	`ib`	InfiniBand interconnect

Combining Constraints

# AND - require both features
#SBATCH --constraint="intel&avx512"

# OR - accept either feature
#SBATCH --constraint="icelake_8358|sapphirerapids"

Cluster Topology and Network Switches

Compute nodes are connected to each other through a message-passing network built from a set of leaf switches. (This is separate from the HPC private network: each compute node is dual-homed and also has a private connection back to the login nodes, the Slurm controller, and storage. The topology described here is only about the compute-node-to-compute-node message-passing network.) Not every switch on that network is cross-connected: some nodes sit behind Ethernet switches that form isolated “islands,” while others are attached to a high-speed InfiniBand (IB) fabric where every switch can reach every other. A multi-node job whose nodes land on two unconnected Ethernet switches will fail to communicate, so node placement matters for any job that spans more than one node.

The default: `--switches=1`

To keep multi-node jobs from being split across switches that can't talk to each other, the cluster adds --switches=1 to your submission when you don't specify --switches yourself. This confines every node in the job to a single switch. No maximum wait is attached, so the job stays pending until enough nodes are free on one switch. Single-node jobs are unaffected. This behavior is part of automatic submit-time processing — see Network switch placement on the Job Lifecycle page.

Relaxing the restriction safely

Confining a job to one switch can make it wait longer (or prevent it from starting at all) when it needs more nodes than any single switch provides. You can safely allow a job to span multiple switches only when its nodes are on the InfiniBand fabric, where cross-switch communication works. The safe pattern is to pin the job to IB nodes and then raise the switch count:

# Allow up to 2 switches, but only on the connected InfiniBand fabric
#SBATCH --constraint=ib
#SBATCH --switches=2

The --constraint=ib guarantees the nodes come from the connected fabric rather than an Ethernet island, and the higher --switches value lets the scheduler draw nodes from more than one IB switch. You can also attach a maximum wait so Slurm gives up on the tighter placement after a while and runs the job anyway:

# Prefer a single switch, but start after 30 min even if that means more
#SBATCH --constraint=ib
#SBATCH --switches=1@00:30:00

Do not raise --switches without also requesting --constraint=ib: doing so lets the job spread across unconnected Ethernet switches, where the nodes cannot communicate. Single-node jobs never need any of this.

Checking Resource Availability

OIT provides a small set of helper commands on the login nodes — si, sa, and sqos — that summarize the most common availability and account queries in a more readable form than the raw Slurm commands. They take no setup; just run them. The underlying Slurm commands still work and are listed for reference. Add --help to any helper for its full set of options.

Resource availability — `si`

si prints one table: Partition, architecture, and Avail / Alloc / Total counts of a single resource. By default that resource is CPU cores; --gpus reports GPUs (architecture becomes the GPU model) and --memory reports memory in GiB. Total is the full deployed capacity, so a fully-busy architecture shows Avail 0 rather than disappearing.

si                      # cores per partition + CPU architecture
si -p gpu               # restrict to a single partition
si --gpus               # GPUs per partition + GPU model (GPU nodes only)
si --memory             # memory (GiB) per partition + architecture
si --all                # also include down nodes (adds a Down/Drain column)

si --nodes              # one row per node (cores)
si --nodes --gpus       # per-node GPUs
si --nodes --drain      # only drained/draining nodes, with reason
si --gpus --qos gpu     # only what a job under QOS 'gpu' could take right now

Equivalent native commands: sinfo, sinfo -N -l, sinfo -p gpu -o "%N %G %t %C".

Your accounts and QOS — `sa` and `sqos`

sa lists your Slurm associations — the accounts you can charge to, the partitions, your default QOS, and the full QOS list for each. sqos answers the more practical question of which QOS you can actually use, on which partitions, and with what limits (wall time, and per-user / per-job CPU, GPU, and memory caps).

sa                      # your associations (account, partition, default QOS, allowed QOS)
sa --tree               # your place in the account hierarchy, back to root
sa alice                # another user's associations

sqos                    # QOS you can submit with, their partitions, and limits
sqos -v                 # also show priority and flags, plus the source associations

Equivalent native command: sacctmgr show assoc user=$USER format=account,qos,maxcpus,maxnodes.

Your fairshare

Fairshare determines your scheduling priority relative to other users — see Job Priority and Fairshare for how it is calculated.

sshare -u $USER

Cluster Status

For a graphical view of node availability, see the cluster status page.

Slurm Partitions and Resources