High Performance Computing | Job Priority and Fairshare

How Job Priority Works

When multiple jobs are waiting for resources, Slurm uses a priority score to decide which job runs first. Higher priority jobs are scheduled before lower priority jobs.

The priority score is calculated from several factors:

Fairshare: Your recent usage compared to your allocation
QOS priority: Bonus from Quality of Service settings
Job age: How long the job has been waiting
Job size: Number of resources requested
Partition priority: Priority assigned to the partition

Fairshare

Fairshare is the primary factor in job priority. It ensures that all users get their fair portion of cluster resources over time.

How fairshare works

Each account has a share allocation representing their portion of the cluster
Slurm tracks your recent usage (CPU-hours consumed)
If you've used less than your share, your priority increases
If you've used more than your share, your priority decreases
Usage decays over time, so heavy past usage won't penalize you forever

Fairshare formula

Your fairshare factor (between 0 and 1) is calculated as:

Fairshare Factor = 2^(-EffectiveUsage / ShareAllocation)

Where:

EffectiveUsage is your recent resource consumption (with decay)
ShareAllocation is your account's assigned share

A fairshare factor of 1.0 means you haven't used any resources recently (highest priority). A factor near 0 means you've used significantly more than your share (lowest priority).

Check your fairshare

$ sshare -u $USER
             Account       User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
       myproject_cpu    unityID        100    0.005000      150000      0.003500   0.750000
       myproject_gpu    unityID        100    0.001000       50000      0.001100   0.870000

Each project shows up as two rows: myproject_cpu for your CPU jobs (jobs submitted to the compute or compute_partners partitions) and myproject_gpu for your GPU jobs (gpu / gpu_partners). Slurm tracks usage and priority on each side independently — heavy CPU usage doesn't reduce your priority for GPU work, and vice versa.

Key columns:

NormShares: Your share as a fraction of total cluster shares
EffectvUsage: Your effective usage as a fraction of total usage
FairShare: Your fairshare factor (higher is better)

View detailed fairshare

# Show full account hierarchy
sshare -a

# Show one half of your project
sshare -A myproject_cpu
sshare -A myproject_gpu

# Show all users in one half
sshare -A myproject_cpu -a

QOS Priority

Each QOS has a priority value that adds to your job's base priority:

QOS	Priority Bonus	Effect
`normal`	0	Standard priority
`long`	0	Standard priority, wall time up to 10 days
`short`	10	Higher priority for short jobs
`gpu`	0	Standard priority for GPU jobs
`short_gpu`	10	Higher priority for short GPU jobs
Partner QOS	Varies	Priority based on partner allocation

The short and short_gpu QOS provide a priority boost for jobs under 2 hours, plus access to idle partner hardware.

Job Age

Jobs gain priority the longer they wait in the queue. This prevents starvation where low-priority jobs never run.

The age factor increases linearly up to a maximum. After reaching maximum age priority, the job won't gain additional priority from waiting.

Checking Job Priority

View priority of pending jobs

$ sprio -u $USER
  JOBID     USER   PRIORITY        AGE  FAIRSHARE    QOS
 123456  unityID      10500       1000       9000    500
 123457  unityID      10200        700       9000    500

The PRIORITY column shows the total priority score. Higher values run first.

View priority breakdown

sprio -j JOBID -l

View all priority factors

sprio -w

This shows the weights assigned to each priority factor.

Why Is My Job Waiting?

Use squeue to see why your job is pending:

$ squeue -u $USER -o "%.10i %.9P %.20j %.8u %.2t %.10M %.6D %R"
     JOBID PARTITION                 NAME     USER ST       TIME  NODES REASON
    123456   compute             analysis  unityID PD       0:00      4 (Priority)
    123457   compute           preprocess  unityID PD       0:00      1 (Resources)

Common pending reasons

Reason	Meaning	What to do
Priority	Other jobs have higher priority	Wait; use `short` QOS for jobs under 2 hours
Resources	Waiting for requested resources	Wait; consider reducing resource request
QOSMaxCpuPerUserLimit	Hit your CPU limit for this QOS	Wait for running jobs to finish
QOSMaxJobsPerUserLimit	Hit your job count limit	Wait for running jobs to finish
AssocGrpCPURunMinutesLimit	Account allocation exhausted	Contact HPC support

Estimate start time

squeue -j JOBID --start

Note: Start time estimates are approximate and change as other jobs complete or are submitted.

Improving Your Priority

Use appropriate QOS

For jobs under 2 hours, use short or short_gpu for a priority boost and access to more resources.

Request only what you need

Smaller jobs are easier to schedule and consume less of your fairshare allocation:

Request only the cores your application can use
Set accurate time limits (jobs ending early return resources)
Request appropriate memory, not maximum

Use job arrays efficiently

For many similar jobs, array jobs are more efficient than individual submissions and count as a single job against limits.

Spread usage over time

Running many large jobs at once rapidly consumes your fairshare. Spreading jobs over time keeps your priority higher.

Backfill Scheduling

Slurm uses backfill scheduling to improve cluster utilization. Even if your job has lower priority, it may start earlier if:

It fits in a gap before higher-priority jobs can start
It won't delay higher-priority jobs
It has an accurate time limit

Tip: Accurate time limits help backfill scheduling. If your job requests 4 days but only runs 2 hours, you lose backfill opportunities.

Partner Priority

Research groups that have purchased hardware for the cluster receive:

Dedicated fairshare allocation for their purchased resources, computed independently for CPU and GPU contributions — CPU cores raise priority on the CPU side of the tree; GPUs raise priority on the GPU side.
Partner QOS with higher priority on their hardware (p_<group> on compute_partners, p_<group>_gpu on gpu_partners).
Access to general cluster resources at standard priority.

See Running Partner Jobs for the user-facing details and HPC Partner Program for program-level information.

Account Hierarchy

Fairshare is calculated hierarchically. The Slurm account tree is split at the root into a CPU side and a GPU side; every project, department, college, and institution exists as both a _cpu and a _gpu twin:

Root
├── cpu                                       ├── gpu
│   └── Institution_cpu                       │   └── Institution_gpu
│       └── College_cpu                       │       └── College_gpu
│           └── Department_cpu                │           └── Department_gpu
│               └── Project_cpu (account)     │               └── Project_gpu (account)
│                   └── User                  │                   └── User

Each user has associations on both halves. The job-submit plugin picks the correct half based on the partition you choose, so you don't normally need to think about the suffix — it just shows up in sshare, sacct, and sacctmgr output. See Running Partner Jobs for the routing details.

Usage rolls up through each half independently. Heavy CPU usage by one project in your department reduces priority for that department's _cpu branch but leaves the _gpu side unaffected.

Useful Commands Summary

Command	Description
`sshare -u $USER`	Show your fairshare
`sprio -u $USER`	Show priority of your pending jobs
`squeue -u $USER`	Show your jobs and pending reasons
`squeue -j JOBID --start`	Estimate job start time
`sacctmgr show assoc user=$USER`	Show your account associations

Job Priority and Fairshare