Partitions

Partitions group nodes by hardware type and access level. Specify a partition with #SBATCH --partition=NAME.

Partition Description Access Default QOS
compute Standard CPU compute nodes All users normal
compute_partners Extended CPU pool including partner nodes Partner projects p_<group>
gpu Standard GPU nodes All users gpu
gpu_partners Extended GPU pool including partner nodes Partner projects p_<group>_gpu

If no partition is specified, the default partition (compute) is used. Partner projects should use compute_partners or gpu_partners to access their dedicated resources; the partition's default QOS is the partner's own p_<group> (or p_<group>_gpu) so jobs land on the partner allocation automatically — see Running Partner Jobs.

Partition Relationships

The partner partitions include many nodes from the standard partitions plus additional partner-contributed nodes:

Venn diagram: the compute partition (available to all users) is mostly contained within the larger compute_partners partition (available to partner projects).
Venn diagram: the gpu partition (available to all users) is mostly contained within the larger gpu_partners partition (available to partner projects).

Quality of Service (QOS)

QOS controls job priority and resource limits. Specify with #SBATCH --qos=NAME. Each partition has a default QOS.

QOS Priority Max Wall Time Description
normal Standard 4 days Standard CPU jobs on compute partition
long Standard 10 days Long-running CPU jobs on compute partition that need more than the standard 4-day limit
gpu Standard 4 days Standard GPU jobs on gpu partition
short Higher 2 hours Short jobs on compute_partners; access to idle partner nodes
short_gpu Higher 2 hours Short jobs on gpu_partners; access to idle partner GPUs
p_<group> Highest Varies Per-partner CPU QOS, allowed on compute_partners. Default for members of partner project group when submitting to that partition.
p_<group>_gpu Highest Varies Per-partner GPU QOS, allowed on gpu_partners. Default for members of partner project group when submitting to that partition.

The long QOS is available to all users on the compute partition for jobs that need more than the standard 4-day wall time, up to 10 days. Request it with #SBATCH --qos=long. Because long jobs hold resources for an extended period, set an accurate --time and use checkpointing where possible.

Each partner project gets its own p_<group> and (when applicable) p_<group>_gpu QOS. Limits are sized by the partner's CPU and GPU contributions to the cluster.

Partition and QOS Availability

Each partition allows specific QOS values. Jobs must use a QOS that is allowed in the requested partition:

Partition normal long gpu short short_gpu p_<group> p_<group>_gpu
compute Yes Yes - - - - -
compute_partners - - - Yes - Yes -
gpu - - Yes - - - -
gpu_partners - - - - Yes - Yes

Note: The short and short_gpu QOS allow all users to access idle partner resources for jobs under 2 hours. The per-partner p_<group> and p_<group>_gpu entries are project-specific — only members of partner project group can use them.

See Job Priority and Fairshare for details on how QOS affects scheduling priority.

Partner Resources

Research groups that have purchased nodes for the cluster have access to the partner partitions (compute_partners / gpu_partners) and a per-project QOS (p_<group> / p_<group>_gpu) with elevated priority on partner-contributed hardware. Each partner project's CPU and GPU contributions raise its priority on the corresponding side of the account tree independently.

Compute Node Types

The cluster contains several generations of compute hardware. You can request specific node types using --constraint.

CPU Nodes

Constraint CPU Model Cores/Node Memory
genoa AMD EPYC 4th Gen 192 768 GB
sapphirerapids Intel Xeon 4th Gen 64 256-512 GB
icelake Intel Xeon 3rd Gen 64 256 GB
cascadelake Intel Xeon 2nd Gen 32 192 GB
skylake Intel Xeon Scalable 32 192 GB
broadwell Intel Xeon E5 v4 24 128 GB
haswell Intel Xeon E5 v3 20 128 GB

Example: Request Sapphire Rapids nodes:

#SBATCH --constraint=sapphirerapids

GPU Nodes

GPU Type GPU Memory GPUs/Node Request
NVIDIA H200 141 GB 4 --gres=gpu:h200:N
NVIDIA H100 80 GB 4 --gres=gpu:h100:N
NVIDIA A100 40/80 GB 4 --gres=gpu:a100:N
NVIDIA L40S 48 GB 4 --gres=gpu:l40s:N
NVIDIA L40 48 GB 4 --gres=gpu:l40:N
NVIDIA A30 24 GB 2 --gres=gpu:a30:N
NVIDIA A10 24 GB 2 --gres=gpu:a10:N
NVIDIA P100 16 GB 2 --gres=gpu:p100:N
NVIDIA RTX 2080 8 GB 4 --gres=gpu:rtx_2080:N
NVIDIA GTX 1080 8 GB 4 --gres=gpu:gtx1080:N

Example: Request 2 A100 GPUs:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:a100:2

Additional Constraints

Beyond CPU architecture, you can constrain jobs by other node features using --constraint:

CategoryConstraintDescription
Vendorintel, amdCPU vendor
GPU VendornvidiaNodes with NVIDIA GPUs
Instruction Setavx, avx2AVX vector instructions
Instruction Setavx512AVX-512 (Skylake and newer)
Instruction Setsse4_1, sse4_2SSE 4.x instructions
NetworkibInfiniBand interconnect

Combining Constraints

# AND - require both features
#SBATCH --constraint="intel&avx512"

# OR - accept either feature
#SBATCH --constraint="icelake|sapphirerapids"

Cluster Topology and Network Switches

Compute nodes are connected to each other through a message-passing network built from a set of leaf switches. (This is separate from the HPC private network: each compute node is dual-homed and also has a private connection back to the login nodes, the Slurm controller, and storage. The topology described here is only about the compute-node-to-compute-node message-passing network.) Not every switch on that network is cross-connected: some nodes sit behind Ethernet switches that form isolated “islands,” while others are attached to a high-speed InfiniBand (IB) fabric where every switch can reach every other. A multi-node job whose nodes land on two unconnected Ethernet switches will fail to communicate, so node placement matters for any job that spans more than one node.

The default: --switches=1

To keep multi-node jobs from being split across switches that can't talk to each other, the cluster adds --switches=1 to your submission when you don't specify --switches yourself. This confines every node in the job to a single switch. No maximum wait is attached, so the job stays pending until enough nodes are free on one switch. Single-node jobs are unaffected. This behavior is part of automatic submit-time processing — see Network switch placement on the Job Lifecycle page.

Relaxing the restriction safely

Confining a job to one switch can make it wait longer (or prevent it from starting at all) when it needs more nodes than any single switch provides. You can safely allow a job to span multiple switches only when its nodes are on the InfiniBand fabric, where cross-switch communication works. The safe pattern is to pin the job to IB nodes and then raise the switch count:

# Allow up to 2 switches, but only on the connected InfiniBand fabric
#SBATCH --constraint=ib
#SBATCH --switches=2

The --constraint=ib guarantees the nodes come from the connected fabric rather than an Ethernet island, and the higher --switches value lets the scheduler draw nodes from more than one IB switch. You can also attach a maximum wait so Slurm gives up on the tighter placement after a while and runs the job anyway:

# Prefer a single switch, but start after 30 min even if that means more
#SBATCH --constraint=ib
#SBATCH --switches=1@00:30:00

Do not raise --switches without also requesting --constraint=ib: doing so lets the job spread across unconnected Ethernet switches, where the nodes cannot communicate. Single-node jobs never need any of this.

Checking Resource Availability

OIT provides a small set of helper commands on the login nodes — si, sa, and sqos — that summarize the most common availability and account queries in a more readable form than the raw Slurm commands. They take no setup; just run them. The underlying Slurm commands still work and are listed for reference. Add --help to any helper for its full set of options.

Node availability — si

si shows how many nodes are currently free (idle or mixed), grouped by CPU architecture on the compute partitions and by GPU model on the GPU partitions.

si                      # per-partition summary of free nodes, by architecture / GPU model
si -p gpu               # restrict to a single partition
si --memory             # summary grouped by total node memory size instead
si --all                # include nodes in every state (allocated, drained, down, ...)

si --nodes              # per-node table: available / allocated / offline / total cores
si --nodes --memory     # per-node free / allocated / total memory
si --nodes --gpus       # per-node free / allocated / total GPUs (GPU nodes only)

Equivalent native commands: sinfo, sinfo -N -l, sinfo -p gpu -o "%N %G %t".

Your accounts and QOS — sa and sqos

sa lists your Slurm associations — the accounts you can charge to, the partitions, your default QOS, and the full QOS list for each. sqos answers the more practical question of which QOS you can actually use, on which partitions, and with what limits (wall time, and per-user / per-job CPU, GPU, and memory caps).

sa                      # your associations (account, partition, default QOS, allowed QOS)
sa --tree               # your place in the account hierarchy, back to root
sa alice                # another user's associations

sqos                    # QOS you can submit with, their partitions, and limits
sqos -v                 # also show priority and flags, plus the source associations

Equivalent native command: sacctmgr show assoc user=$USER format=account,qos,maxcpus,maxnodes.

Your fairshare

Fairshare determines your scheduling priority relative to other users — see Job Priority and Fairshare for how it is calculated.

sshare -u $USER

Cluster Status

For a graphical view of node availability, see the cluster status page.

See Also