High Performance Computing | Understanding the Cluster

What is a Linux Cluster?

Hazel is a collection of commodity compute servers (primarily Lenovo two-socket servers with Intel Xeon processors) with a few serving as login or service nodes and most serving as compute nodes. These servers are connected by high-speed networks and share common storage, allowing them to work together as a single system for running computational workloads.

Cluster Architecture

Diagram of Linux cluster organization showing one login node connected from above to network cloud labeled Internet and below to network cloud labeled HPC Private Network. Below the Private Network cloud are boxes representing compute nodes connected above to the private network cloud and below to a message passing cloud. To the side connected to private network is cylinder representing storage

Node Types

Login Nodes

The login nodes are accessible from the broader network (Internet or campus network) and provide the entry point to the cluster. Login nodes are shared with all users and are intended only for:

Editing files and scripts
Compiling code
Submitting and monitoring batch jobs

No resource-intensive processes may be run on login nodes. This is a shared resource subject to the Acceptable Use Policy.

Compute Nodes

Compute nodes are where jobs are executed. They should not be accessed directly. Jobs are submitted to the scheduler from the login node, and the scheduler allocates compute nodes and runs jobs there.

Compute nodes have two network connections (HPC private network and HPC message passing network) but do not have a direct connection to any external network. Access to locations outside the cluster must be specifically configured via a proxy server.

Cluster Networks

HPC Private Network

The HPC private network connects all the cluster nodes together. It is primarily used for job control and access to storage. In Hazel this is an Ethernet network with 100Gbps Ethernet core and 25Gbps connections to nodes.

HPC Message Passing Network

The HPC message passing network is dedicated to communication between job tasks, such as distributed memory parallel jobs using Message Passing Interface (MPI) communication. In Hazel this is an InfiniBand network with multiple 200Gbps (high data rate - HDR) links forming the core and 100Gbps (enhanced data rate - EDR) links to nodes.

Storage Overview

There are several different storage areas on Hazel, each with different capacities and intended purposes. For full details, see the Storage documentation.

Location	Path	Purpose	Backed Up?
Home Directory	`/home/user_name`	Scripts, small applications, and temporary files needed by the scheduling system. 15 GB and 10K file quota.	Yes (daily backups and snapshots)
Scratch Directory	`/share/group_name/user_name`	Data for running jobs and job results. 20 TB and 1M file quota per project.	No. Files not accessed for 30 days are automatically deleted.
Application Directory	`/usr/local/usrapps/group_name`	User-installed software. 100 GB and 250K files default quota.	Yes
Research Storage	(varies)	Long-term research data accessible from all Hazel nodes.	Yes

Learn about transferring files to and from the cluster.

Authorization and Access

Accounts must be authorized to connect to the Hazel Linux Cluster. Authorization is organized by projects, which facilitate file sharing between project members, cluster usage accounting, and various resource limits and quotas.

Research Projects

Research projects are owned by a faculty member. Project members are managed by the faculty member using a self-service web application: https://research.oit.ncsu.edu.

Course Projects

Course instructors may request a course project with access for students enrolled in the course by sending email to help@ncsu.edu.

For full details on obtaining access, see the Getting HPC Access page.

Connecting to Hazel

Web Browser (Open OnDemand)

A web-based interface, Open OnDemand, is available to all authorized Hazel accounts regardless of your institution. Use your web browser to connect to https://servood.hpc.ncsu.edu. Open OnDemand provides a web-based terminal session on a Hazel login node, along with additional functions and applications.

See Open OnDemand documentation for more details.

Terminal Window (SSH)

From a terminal window on Linux, Mac, or Windows you can connect to a shared Hazel login node using the secure shell command: ssh user_name@login.hpc.ncsu.edu where "user_name" is replaced by your authorized Hazel user name.

See the full login documentation for detailed instructions.

Applications on Hazel

HPC Staff-Maintained Applications

There is a set of applications maintained on Hazel by the HPC staff. Generally, to use one of these applications you will use the module command to set up your environment, create a job script to execute the application, and then submit the job script to the scheduling system.

User-Installed Applications

Users can install their own applications in their project's application directory. The primary tools recommended for installing applications are Conda and Apptainer containers.

See the software installation FAQ for general guidance.

The Job Scheduler

Applications must not be executed directly on the login nodes. Anything that uses significant CPU or memory resources must be run via the job scheduler.

The job scheduler manages all compute resources on the cluster. You submit a job script describing what to run and what resources you need, and the scheduler queues your job, allocates compute nodes when resources are available, and runs your job there.

Hazel uses the Slurm workload manager for scheduling jobs. Key scheduler concepts include:

Batch jobs: Scripts submitted to the scheduler that run unattended on compute nodes
Interactive jobs: Sessions on compute nodes for testing and debugging
Partitions: Groups of nodes organized by hardware type (CPU, GPU) and access level
QOS (Quality of Service): Controls job priority and resource limits
Fairshare: Ensures equitable access to resources across all users over time

Learn how to submit and manage jobs with Slurm

Basic Terminology

Term	Definition
Node	An individual server (computer) in the cluster
Core	A processing unit within a CPU; each node has multiple cores
Job	A unit of work submitted to the scheduler to run on compute nodes
Batch script	A text file containing scheduler directives and commands for your job
Partition	A group of nodes with shared characteristics (e.g., CPU nodes, GPU nodes)
Module	A package that configures your environment to use a specific software application
MPI	Message Passing Interface - a standard for parallel programs that communicate across nodes
OpenMP	A parallel programming model for shared-memory (single-node) parallelism using threads
GPU	Graphics Processing Unit - specialized hardware for parallel computation, used for machine learning and scientific computing
NUMA	Non-Uniform Memory Access - a memory architecture where access time depends on memory location relative to a processor
InfiniBand	A high-speed network interconnect used for communication between nodes in parallel jobs
Scratch	High-performance temporary storage for job data; not backed up

Understanding the Cluster