High Performance Computing | Running Partner Jobs

Am I a Partner Project Member?

Run this command from a login node to see your associations:

sa

You are a partner project member if you see a row with a partner partition and a partner QOS, e.g.:

   Account     Partition       QOS                          DefQOS
smithlab_cpu                   normal,short,gpu,short_gpu   normal
smithlab_cpu  compute_partners p_smithlab,short             p_smithlab
smithlab_gpu                   normal,short,gpu,short_gpu   normal
smithlab_gpu  gpu_partners     p_smithlab_gpu,short_gpu     p_smithlab_gpu

The p_<group> / p_<group>_gpu entries identify your partner project (smithlab in this example). If no partner QOS appears, your project does not have partner access.

Your Project Has Two Halves

Every Slurm account on Hazel exists as both a CPU half (e.g. smithlab_cpu) and a GPU half (smithlab_gpu). You don't normally need to think about this:

Submit with --account=group (or omit --account entirely — the cluster uses your default account, which is the CPU half) and the scheduler routes the job to the right half based on the partition you chose:
- -p compute or -p compute_partners → group_cpu
- -p gpu or -p gpu_partners → group_gpu
The suffixed name (_cpu or _gpu) shows up in sacctmgr queries and in sacct output so you can see which half a job ran under.
You can specify the suffixed name explicitly (--account=smithlab_cpu), but if it doesn't match the partition (e.g. --account=smithlab_gpu -p compute) the submit is rejected with an error.

Partner Partitions

Partner projects have access to two extended partitions in addition to the standard partitions:

Partition	Includes	Who can submit
`compute_partners`	All standard CPU nodes plus partner-contributed CPU nodes	Partner project members
`gpu_partners`	All standard GPU nodes plus partner-contributed GPU nodes	Partner project members

See Partitions and Resources for the full list of partitions and the Venn diagrams showing how partner partitions overlap with standard ones.

Partner QOS

When a partner project is set up, two QOS may be created based on the project's contribution:

QOS	Allowed in partition	Purpose
`p_<group>`	`compute_partners`	Higher priority on partner-contributed CPU nodes
`p_<group>_gpu`	`gpu_partners`	Higher priority on partner-contributed GPU nodes

Each partner partition also allows a short-job QOS for quick test work without a priority allocation:

compute_partners allows p_<group> and short.
gpu_partners allows p_<group>_gpu and short_gpu.

Default QOS in Partner Partitions

You do not need to specify --qos when submitting to a partner partition. Each partner project member has a partition-specific default:

Submitting to…	Default QOS
`-p compute_partners`	`p_<group>`
`-p gpu_partners`	`p_<group>_gpu`
`-p compute` (general)	`normal` — non-partner work is unchanged
`-p gpu` (general)	`gpu` — non-partner work is unchanged

This means submitting to your partner partition automatically uses your partner allocation. To opt into a short test job on partner hardware, override the default with --qos=short (CPU) or --qos=short_gpu (GPU).

Automatic Hardware Placement

Partner projects contribute specific hardware — particular CPU generations (architectures) and/or GPU models. When you run under your partner QOS, the scheduler can automatically steer your job onto hardware matching what your project actually purchased, so your priority allocation is spent on nodes matching architecture contributed rather than scattered across the partner partition.

This happens by default on the partner partitions. A job submitted to compute_partners / gpu_partners uses your partner QOS unless you specify a different one, and the scheduler applies the placement automatically — you do not need to name --qos. To opt out (e.g. for a quick test on any available partner node), specify a non-partner QOS explicitly: --qos=short (CPU) or --qos=short_gpu (GPU). You can always pick specific hardware yourself with --constraint (CPU) or --gres=gpu:<model> (GPU) regardless of QOS.

CPU architecture

On compute_partners under your partner QOS (the default there), when you give no --constraint of your own the scheduler adds a --constraint for the most capable CPU architecture your project purchased (and accounts for it in billing). If your project purchased more than one architecture, a note is printed at submit time naming the choice and the alternatives:

Note: QOS 'p_smithlab' defaulted --constraint to 'genoa' (most capable purchased).
Override with --constraint=<arch>; allowed: genoa, skylake.

To run on a different architecture your project purchased, set --constraint yourself — the scheduler leaves an explicit constraint untouched:

#SBATCH --partition=compute_partners
#SBATCH --constraint=skylake     # use the project's skylake nodes instead of the genoa default

If your project purchased only one CPU architecture, that architecture is pinned automatically and no note is printed.

GPU model

On gpu_partners under your partner QOS (the default there), GPU model handling mirrors the CPU case but is enforced (GPU models are tracked only as generic resources, not as node features, so they cannot be expressed with --constraint):

Request an untyped GPU (--gres=gpu:N) and the scheduler fills in your project's most capable purchased model — e.g. rewriting --gres=gpu:1 to --gres=gpu:h200:1 — and prints a note listing the alternatives.
Request a specific model (--gres=gpu:model:N) and it must be one your project purchased; any other model is rejected at submit time.

To run a non-purchased GPU model on partner GPU nodes, opt out of the partner QOS with --qos=short_gpu.

#SBATCH --partition=gpu_partners
#SBATCH --gres=gpu:h100:2         # h100 must be in your project's purchase, else rejected

Which hardware did my project purchase?

The allowed architectures and GPU models appear in the submit-time notes above, and you can also read them directly from your partner QOS's group resource limits:

sacctmgr show qos p_group     format=name,grptres
sacctmgr show qos p_group_gpu format=name,grptres

The gres/cpu:<arch>=N and gres/gpu:<model>=N entries in GrpTRES name the architectures and GPU models (with counts) your project contributed. Those names are exactly the values accepted by --constraint (CPU) and --gres=gpu:<model> (GPU). See Partitions and Resources for the cluster-wide list of CPU architectures and GPU models.

Submitting Partner Jobs

CPU partner job (uses `p_<group>` by default)

#!/bin/bash
#SBATCH --job-name=partner_cpu
#SBATCH --output=cpu.out.%j
#SBATCH --error=cpu.err.%j
#SBATCH --partition=compute_partners
#SBATCH --ntasks=16
#SBATCH --mem=32G
#SBATCH --time=24:00:00

./my_program

GPU partner job (uses `p_<group>_gpu` by default)

#!/bin/bash
#SBATCH --job-name=partner_gpu
#SBATCH --output=gpu.out.%j
#SBATCH --error=gpu.err.%j
#SBATCH --partition=gpu_partners
#SBATCH --gres=gpu:h100:1
#SBATCH --ntasks=1
#SBATCH --mem=80G
#SBATCH --time=08:00:00

module load cuda
./train.py

Quick test on partner hardware (overrides the default)

#SBATCH --partition=compute_partners
#SBATCH --qos=short
#SBATCH --time=01:00:00

Without an explicit --qos, the partner QOS is used; with --qos=short (or --qos=short_gpu), the job runs under the short-job allocation instead.

Fair Share for Partners

Partner projects accumulate scheduling priority faster than non-partner projects of equal age. The mechanism, in brief:

Each partner project is granted an elevated fair share weighted by the resources it has contributed (CPU cores and GPUs, by model).
That share rolls up the account hierarchy — project → department → college → institution — so partner activity also raises the priority of the units containing it.
Non-partner accounts retain a baseline share, so partner contributions do not depress non-partner priority below its normal level.

The practical effect: a partner job and a non-partner job submitted at the same time, with similar age, will see the partner job scheduled first — especially on the partner partitions where the partner QOS also adds priority. See Priority and Fair Share for how fair share factors into Slurm's overall multifactor priority.

Common Issues

Problem	Cause	Solution
Job rejected: "Invalid qos specification"	Specified a QOS not allowed in the chosen partition	On partner partitions, only `p_<group>` / `p_<group>_gpu` and `short` / `short_gpu` are allowed. Drop `--qos` to use the default.
Job rejected: "account is the CPU half but this is a GPU job" (or vice versa)	Explicit `--account=<group>_cpu` with a GPU partition, or `--account=<group>_gpu` with a CPU partition	Use `--account=<group>` (unsuffixed) and let the scheduler pick the right half, or match the suffix to the partition.
Job pending with reason "QOSMaxJobsPerUserLimit" or similar	Hit a limit on the partner QOS	Wait for running partner jobs to finish, or submit non-priority work to `compute` / `gpu`.
Want to run on partner hardware without consuming partner allocation	Need a short-job QOS	Use `--qos=short` on `compute_partners` or `--qos=short_gpu` on `gpu_partners`.
Job rejected: "QOS '…' may only use GPU model(s): …"	Requested a GPU model your project did not purchase while under your partner GPU QOS (the default on `gpu_partners`)	Request one of the models in the error's allowed list (`--gres=gpu:<model>:N`); drop the model (`--gres=gpu:N`) to take your project's most capable purchased model automatically; or use `--qos=short_gpu` to run a different model on any partner GPU node.
Partner job did not land on the nodes my project purchased	The job opted out of the partner QOS (e.g. `--qos=short` / `--qos=short_gpu`), used the general `compute`/`gpu` partition, or set its own `--constraint`	Submit to `compute_partners` / `gpu_partners` without overriding `--qos` so the partner QOS and its automatic placement apply; or set `--constraint` / `--gres=gpu:<model>` to the purchased hardware yourself.

Partitions and Resources — full partition list with hardware overlap diagrams.
Priority and Fair Share — multifactor priority and the Partner Priority section.
GPU Jobs — GPU type selection and request syntax.
HPC Partner Program — partnership program overview and how to join.

Running Partner Jobs