High Performance Computing

Outline

Basic, essential information covered in this guide:

How is Hazel organized/What is a Linux Cluster
Authorization to access Hazel
- Research Project
- Course Project
Connecting to Hazel
- Web browser - Open OnDemand
- Terminal window - ssh
Where to keep your files
- Home directory
- Scratch directory
- Application directory
- Research Storage
How to get files onto and off of Hazel
- Web browser
  - Open OnDemand
  - Globus
- Terminal window - sftp
Applications on Hazel
- HPC staff maintained applications
- Individually installed and maintained applications
  - Conda
  - Containers
Running jobs on Hazel
- CPU jobs
- GPU jobs

How is Hazel organized/What is a Linux cluster

Hazel is a collection of commodity compute servers (primarily Lenovo two socket servers with Intel Xeon processors) with a few serving as login or service nodes and most serving as compute nodes. Here is a diagram and description of Linux cluster organization.

Authorization to access Hazel

Accounts must be authorized to be able to connect to the Hazel Linux Cluster. Authorization is organized by projects. These projects are used to facilitate file sharing between project members, cluster usage accounting, and various resource limits and quotas.

Research Project

Research projects are owned by a faculty member. Project members are managed by the faculty member using a self-service web application: https://research.oit.ncsu.edu.

Course Project

Course instructors may request a course project with access for students enrolled in the course by sending Email to oit_hpc@help.ncsu.edu

Connecting to Hazel

Web Browser

A web based interface, Open OnDemand, is available to all authorized Hazel accounts regardless of your institution. Use your web browser to connect to https://servood.hpc.ncsu.edu. Open OnDemand provides a web based terminal session on a Hazel login node.

See Open OnDemand documentation

Terminal Window

From a terminal window on Linux, Mac, or Windows you can connect to a shared Hazel login node using secure shell command: ssh user_name@login.hpc.ncsu.edu where "user_name" is replaced by your authorized Hazel user name. Currently, only authorized NC State Unity IDs are able to connect using ssh directly.

See ssh section of documentation for more details.

Where to keep your files

There are several different places on Hazel that are available for keeping files. These have different amounts of space available and different intended purposes.

Home Directory

When you initially connect to Hazel your working directory is your home directory /home/user_name. There is a 1GB quota as your home directory is intended for holding only scripts, small applications, and temporary files needed by the scheduling system to run your jobs. Your home directory is backed up daily to another data center. In addition there are snapshots taken more frequently that can be used to recover accidentaly deleted files.

Scratch Directory

The scratch directory is where data for running jobs shoud be placed and to hold results from jobs. Any important files should be moved at the end of a run. Scratch space is not backed up and files not accessed for 30 days are automatically deleted. Your scratch directory is located at /share/group_name/user_name. Your project has a 20TB scratch directory quota. group_name is the first group output when running the groups command.

Devising ways to keep files in your scratch directory indefinitely is a violation of the Hazel Acceptable Use Policy.

Application Directory

If you (or your group) are installing your own applications, they should be kept in the application directory (unless they are small enough to fit in your home directory). Each project may request an application directory. Your project's application directory would be located at /usr/local/usrapps/group_name. Projects receive a 100GB default application directory quota.

Requesting application directory for your project.

Research Storage

Research storage is accessible from all Hazel nodes. Recommended workflow is to copy data from Research Storage to your scratch directory at beginning of your job script. Then at the end of your run, copy results back from your scratch directory to Research Storage.

More information about Research Storage.

How to get files onto and off of Hazel

Web browser

There are a couple ways to move files to or from Hazel using your web browser.

Open OnDemand

The file browser widget available in Open OnDemand provides an easy way to move files up to 10GB between your local computer and Hazel.

Globus

Globus is the preferred tool to use for moving larger files. It is able to restart interrupted network sessions and computes a checksum at the end of the transfer to ensure the data was moved correctly. The location you are moving data to/from will need to be connected to a Globus Connect Server or have Globus Personal Connect installed.

See documentation on using Globus

Terminal Window

From a terminal window the sftp command can be used to transfer files between your local computer and Hazel.

See documentation on using sftp

Applications on Hazel

HPC Staff maintained applications

There is a set of applications maintained on Hazel by the HPC staff. Generally to use one of these applications you will use the module command to set up your environment to access the application, create a job script to execute the application, and then submit the job script to the scheduling system.

Individually installed and maintained applications

Unless the application is small, individually installed applications will be located in your project's application directory.

Best practice is to create a module file for the application so that the module command can be used to configure your environment to run the application.

Individually installed application FAQ

Conda

The primary tool recommended for installing applications on Hazel is Conda. Conda will automatically resolve dependencies and select a compatible set of library versions to build your application.

Containers

Containers offer a potentially even easier approach as there may be a container already built with your application. See the Apptainer documentation for details about using containers on Hazel.

Running jobs on Hazel

Hazel uses LSF for resource management and scheduling. LSF is equivalent to Slurm you may have encountered on other clusters.

For anything other than a trival job we strongly recommend creating a job script, a file containing the scheduler options and Linux commands to execute your job.

Applications must not be executed on the login nodes. Anything that uses significant CPU or memory resources must be run via LSF.

CPU jobs

There are a set of queues available to all Hazel users. HPC Partners have additional, dedicated, queues available to them and their projects. CPU jobs need to specify the number of cores they will use and the lenght of time they will run (in wall clock time). Generally it is best to allow LSF to select the queue to place your job, unless you have access to a partner queue, then generally you should place your job in that queue.

The bsub command is used to place a job in a queue. Once a job is queued, the bjobs command can be used to monitor the job's progress.

An overall view of all jobs currently running or pending on Hazel is provided by the bqueues command.

Documentation on running CPU jobs on Hazel

GPU jobs

Like CPU jobs, GPU jobs must specify number of CPU cores they will use and for how long. GPU jobs also must specify how many GPUs they will use and how they will use the GPUs.

Unlike CPU jobs, GPU jobs must specify a GPU queue to be placed into. Generally available GPU queues are gpu and short_gpu. The short queue has access to partner GPU nodes and may have access to GPU models unavailable to the regular GPU queue.

Documentation on running GPU jobs on Hazel

Quick Start

Outline

How is Hazel organized/What is a Linux cluster

Authorization to access Hazel

Research Project

Course Project

Connecting to Hazel

Web Browser

Terminal Window

Where to keep your files

Home Directory

Scratch Directory

Application Directory

Research Storage

How to get files onto and off of Hazel

Web browser

Open OnDemand

Globus

Terminal Window

Applications on Hazel

HPC Staff maintained applications

Individually installed and maintained applications

Conda

Containers

Running jobs on Hazel

CPU jobs

GPU jobs