High Performance Computing

Linux Cluster Nodes

Login

The login node is accessible to the broader network, Internet or campus network and provides method to submit jobs to the compute nodes. No jobs should be run on the login node. This is a shared resource intended to be used to submit and monitor jobs.

Compute

Compute nodes are where jobs are executed (computations are performed). Compute nodes should not be accessed directly. Jobs are submitted to the scheduler from the login node. Only the scheduler should access the compute nodes and run jobs there.

Note that compute nodes have two network connections: HPC private network and HPC message passing network. Compute nodes do not have a connection to any external network, that is they are not connected to the campus network or the Internet. Access to any location outside the cluster has to be specifically configured via a proxy server to route traffic from compute nodes to those locations.

Linux Cluster Networks

HPC Private Network

The HPC private network connects all the cluster nodes together. It is primarily used for job control and access to storage. In Hazel this is an Ethernet network with 100Gbps Ethernet core and 25Gbps connections to nodes. As of May 2025 there are still some older nodes (FlexChassis hardware) that have 10Gbps private network connections.

HPC Message Passing Network

The HPC message passing network is dedicated to communication between job tasks, such as distributed memory parallel jobs using message passing interface (MPI) communication. In Hazel this is an InfiniBand network with multiple 200Gbps (high data rate - HDR) links forming the core and 100Gbps (enhanced data rate - EDR) links to nodes.

Storage

Scratch Directory

The parallel file system (GPFS) is connected to the HPC private network via multiple 100Gbps links. There are two Lenovo DSS storage arrays each with two NSD servers and two JBODs that together create the file system where scratch directories are located. This is the file system that should be used to hold data for running jobs. However this storage is not backed up and files that have not been accessed for 30 days are automatically deleted.

Home and Application Directories

The storage for home and application directories is not shown on this diagram. Currently these directories are located on the same physical storage (a NetApp FAS 8700) as Research Storage. This storage is also connected by multiple 100Gbps Ethernet links to Hazel. However, this network attached storage is mounted using network file system (NFS) protocol which has performance limitations and is not compatible with parallel I/O operations. These directories are backed up and also have periodic snapshots that enable self-service recovery of accidentally deleted files.