At submission

When you run sbatch, salloc, or srun, a submit-time plugin inspects your request before the job is queued. It can adjust a request to make it schedule correctly, or reject a request that can't be satisfied — with an error explaining how to fix it — instead of letting the job sit pending forever. The adjustments below are applied automatically; in most cases you don't need to do anything differently.

What you doWhat the cluster does
Submit without worrying about CPU/GPU account suffixesRoutes the job to the correct half of your account (account routing)
Request GPUsRequires a GPU type; rejects untyped requests (GPU type enforcement)
Submit to the gpu partition without a QOSSets the gpu QOS for you (partition/QOS checks)
Don't specify a CPU architectureAdds a homogeneity constraint so multi-node jobs land on one CPU generation (architecture)
Specify --constraint=icelake (etc.)Adds the matching architecture resource so usage is billed correctly (architecture)
Don't specify --switchesDefaults to --switches=1 so a multi-node job stays on one network switch (switch placement)
Submit a partner jobPins it to the hardware that partner purchased (partner pinning)

Account routing (CPU vs GPU)

Each group's allocation is tracked as two accounts — a CPU half (<group>_cpu) and a GPU half (<group>_gpu). You don't need to manage these suffixes. Keep submitting with --account=<group> (or omit --account entirely) and the cluster rewrites the account to the right half automatically, based on whether your job uses GPUs:

  • Jobs on compute / compute_partners → the _cpu half
  • Jobs on gpu / gpu_partners, or any job requesting a GPU → the _gpu half

If you explicitly name the wrong half for the partition (for example --account=smith_lab_gpu on the compute partition), the job is rejected with a message telling you which account to use. The resolved account is what appears in squeue, sacct, and the resource usage summary.

GPU type enforcement

The cluster has several GPU models, billed at different rates, so GPU requests must name a type. An untyped request such as --gres=gpu:2 is rejected; use --gres=gpu:<type>:<count>:

#SBATCH --gres=gpu:a100:2        # 2 A100 GPUs

Available types: a10, a30, a100, gtx1080, h100, h200, l40, l40s, p100, rtx_2080. See the GPU jobs guide.

Partition / QOS checks

Mismatched partition/QOS combinations are caught at submission rather than leaving the job pending with a confusing reason. For the gpu partition, if you don't specify a QOS (or specify an incompatible one), the cluster sets --qos=gpu for you and notes it. Other incompatible combinations are rejected with the list of allowed QOS. Use sqos to see which QOS and partitions are open to you (see the monitoring FAQ).

CPU architecture and homogeneity

The compute partitions span several generations of CPU hardware. To keep multi-node jobs from being split across mismatched generations, the cluster handles the --constraint for you:

  • No architecture constraint given: a homogeneity constraint is added so all nodes in the job share one CPU generation. Any other constraints you set (e.g. avx512) are preserved and combined with it.
  • A single architecture given (e.g. --constraint=icelake): left as-is, and the matching architecture resource is added so the usage is billed against that hardware.
  • An "either/or" architecture given (e.g. --constraint="icelake|sapphirerapids"): left exactly as you wrote it — you've already expressed a preference.

You normally don't need to set --constraint at all. To target a specific generation, see the architecture list and constraint reference on the Partitions and Resources page. GPU-partition jobs skip this logic (GPU nodes are selected by GPU type instead).

Network switch placement

Some compute nodes sit on network switches that aren't cross-connected, so a multi-node job split across two of them would land on nodes that can't talk to each other. To prevent that, if you don't specify --switches, the cluster sets --switches=1 for you, so all the nodes in your job come from a single switch:

  • No --switches given: --switches=1 is added. There is no maximum wait attached, so the job waits until enough nodes are free on one switch — this keeps multi-node (MPI) jobs from being spread across switches that can't communicate.
  • You set --switches yourself (e.g. --switches=2): left exactly as you wrote it. Nodes on the InfiniBand fabric can communicate across switches, so jobs on that hardware (request it with --constraint=ib) can safely span more than one switch — raise the count if your job needs more nodes than a single switch provides.

Single-node jobs are unaffected. If a multi-node job stays pending longer than you expect, it may be waiting for room on one switch; requesting InfiniBand nodes and a higher --switches count can let it start sooner.

Partner hardware pinning

Partner groups purchase specific hardware and submit with their partner QOS (p_<group>) on the compute_partners / gpu_partners partitions. To keep those jobs on the hardware the group paid for, the cluster:

  • CPU partner jobs: if you don't already give an architecture constraint, adds one matching the CPU generation(s) your group purchased (e.g. --constraint=genoa, or --constraint=[cascadelake|genoa] for more than one).
  • GPU partner jobs: validates that the GPU model you requested is one your group purchased; a model the group didn't buy is rejected with the list of allowed models.

This only applies when you submit with your partner QOS explicitly (--qos=p_<group>). See Partner Jobs for details on partner partitions and QOS.

At startup

When a GPU job begins, the cluster starts per-GPU statistics tracking (via NVIDIA DCGM) on each allocated node so the resource usage summary can report GPU utilization, memory, and energy at the end. This is fully automatic and has no effect on your job — there is nothing to configure. Non-GPU jobs skip it.

At completion — the resource usage summary

When a batch job finishes, the cluster appends an LSF-style Resource usage summary to the end of your job's standard-output file. It gives you an at-a-glance picture of how the job ran without having to query the accounting database:

------------------------------------------------------------
Resource usage summary:

    Job ID            : 123456
    Job name          : mycode
    User / Account    : unityID / smith_lab_cpu
    Partition / QOS   : compute / normal
    Submitted         : 2026-06-13T09:14:02
    Started           : 2026-06-13T09:14:20
    Wait time         : 18 sec
    Elapsed (wall)    : 01:42:37
    Nodes             : 1 (c042n03)
    CPUs allocated    : 8
    Tasks             : 8
    CPU time (total)  : 48213.55 sec  (user 47980.11, sys 233.44)
    CPU efficiency    : 97.8%
    Memory in use     : 6.41 GB  (sampled at job end; see seff for peak)
    Memory requested  : 16G
    Working directory : /home/unityID/run42
    Exit code         : 0:0

    For final accounting (Slurm's MaxRSS, billing, etc.):  seff 123456
------------------------------------------------------------

For multi-node jobs the CPU and memory sections add a per-node breakdown (total CPU time is summed across nodes; memory shows the maximum and the per-node values). For GPU jobs a GPU section lists each allocated GPU with its peak memory, SM and memory utilization, and energy:

    GPUs requested    : 1 x a100

    GPU stats:
        [gpu05 GPU 0 ] mem 18.43 GB, SM 87%, mem-util 41%, energy 1234567 J
Memory caveat: the Memory in use line is sampled at the moment the job ends, so it is a point-in-time value and usually lower than the true peak. For the authoritative peak memory (MaxRSS), billing, and final accounting, run seff <jobid> once the job has completed — the summary footer reminds you of this.

Notes:

  • The summary is written only for batch jobs (those with a --output file). Interactive sessions (salloc) and srun-only jobs don't get one — there's no job output file to append to. Use seff / sacct instead.
  • It is appended once, by the head node, after every step has finished, so it always lands at the very end of your output file.
  • For deeper monitoring and the accounting commands behind these numbers, see the Monitoring FAQ.

After completion — cleanup

Finally, a per-node cleanup step runs as the system removes temporary state left behind by the summary tooling (per-node statistics files and GPU stats records). This is purely housekeeping — it has no effect on your results or output.


Related: Partitions and Resources · GPU Jobs · Partner Jobs · Monitoring FAQ