Quick, Quick Start overview of the Hazel Linux Cluster
This guide is intended to provide the an overall sense of how the Hazel cluster operates. It is not intended to replace the Quick Start Tutorial or other documentation on this web site.
More in-depth information is available on the pages linked from this short introductory guide.
Basic, essential information covered in this guide:
Research projects are owned by a faculty member. Project members are managed by the faculty member using a self-service web application: https://research.oit.ncsu.edu.
Course instructors may request a course project with access for students enrolled in the course by sending Email to oit_hpc@help.ncsu.edu
A web based interface, Open OnDemand, is available to all authorized Hazel accounts regardless of your institution. Use your web browser to connect to https://servood.hpc.ncsu.edu. Open OnDemand provides a web based terminal session on a Hazel login node.
Additional functions and applications are also available from the Open OnDemand portal. See Open OnDemand documentation for more details.From a terminal window on Linux, Mac, or Windows you can connect to a shared Hazel login node using secure shell command: ssh user_name@login.hpc.ncsu.edu where "user_name" is replaced by your authorized Hazel user name. Currently, only authorized NC State Unity IDs are able to connect using ssh directly.
There are several different places on Hazel that are available for keeping files. These have different amounts of space available and different intended purposes.
When you initially connect to Hazel your working directory is your home directory /home/user_name. There is a 1GB quota as your home directory is intended for holding only scripts, small applications, and temporary files needed by the scheduling system to run your jobs. Your home directory is backed up daily to another data center. In addition there are snapshots taken more frequently that can be used to recover accidentaly deleted files.
The scratch directory is where data for running jobs shoud be placed and to hold results from jobs.
Any important files should be moved at the end of a run. Scratch space is not backed up and files not accessed for 30 days are automatically deleted.
Your scratch directory is located at /share/group_name/user_name. Your project has a 20TB scratch directory quota.
group_name is the first group output when running the groups
command.
Devising ways to keep files in your scratch directory indefinitely is a violation of the Hazel Acceptable Use Policy.
If you (or your group) are installing your own applications, they should be kept in the application directory (unless they are small enough to fit in your home directory). Each project may request an application directory. Your project's application directory would be located at /usr/local/usrapps/group_name. Projects receive a 100GB default application directory quota.
Requesting application directory for your project.
Research storage is accessible from all Hazel nodes. Recommended workflow is to copy data from Research Storage to your scratch directory at beginning of your job script. Then at the end of your run, copy results back from your scratch directory to Research Storage.
There are a couple ways to move files to or from Hazel using your web browser.
The file browser widget available in Open OnDemand provides an easy way to move files up to 10GB between your local computer and Hazel.
Globus is the preferred tool to use for moving larger files. It is able to restart interrupted network sessions and computes a checksum at the end of the transfer to ensure the data was moved correctly. The location you are moving data to/from will need to be connected to a Globus Connect Server or have Globus Personal Connect installed.
From a terminal window the sftp
command can be used to transfer files between your local computer and Hazel.
There is a set of applications maintained on Hazel by the HPC staff. Generally to use one of these applications you will use the module
command
to set up your environment to access the application, create a job script to execute the application, and then submit the job script to the scheduling system.
Unless the application is small, individually installed applications will be located in your project's application directory.
Best practice is to create a module file for the application so that the module
command can be used to configure your environment to run the application.
Individually installed application FAQ
The primary tool recommended for installing applications on Hazel is Conda. Conda will automatically resolve dependencies and select a compatible set of library versions to build your application.
Containers offer a potentially even easier approach as there may be a container already built with your application. See the Apptainer documentation for details about using containers on Hazel.
Hazel uses LSF for resource management and scheduling. LSF is equivalent to Slurm you may have encountered on other clusters.
For anything other than a trival job we strongly recommend creating a job script, a file containing the scheduler options and Linux commands to execute your job.
Applications must not be executed on the login nodes. Anything that uses significant CPU or memory resources must be run via LSF.
There are a set of queues available to all Hazel users. HPC Partners have additional, dedicated, queues available to them and their projects. CPU jobs need to specify the number of cores they will use and the lenght of time they will run (in wall clock time). Generally it is best to allow LSF to select the queue to place your job, unless you have access to a partner queue, then generally you should place your job in that queue.
The bsub
command is used to place a job in a queue. Once a job is queued, the bjobs
command can be used to monitor the job's progress.
An overall view of all jobs currently running or pending on Hazel is provided by the bqueues
command.
Documentation on running CPU jobs on Hazel
Like CPU jobs, GPU jobs must specify number of CPU cores they will use and for how long. GPU jobs also must specify how many GPUs they will use and how they will use the GPUs.
Unlike CPU jobs, GPU jobs must specify a GPU queue to be placed into. Generally available GPU queues are gpu and short_gpu. The short queue has access to partner GPU nodes and may have access to GPU models unavailable to the regular GPU queue.
The information in this Quick, Quick Start Guide should have provided a good overview of the steps needed to effectively utilize Hazel. As a next step we recommend moving to the Getting Started on Hazel page.