Recent Changes on Hazel Cluster
- Hierarchical fair share has been enabled for all projects.
This applies a second level of fair share scheduling at the project group level.
- A default memory limit is being implemented across Hazel queues.
Initially this is 2GB per task. Additional memory can be requested using the -M bsub option.
- All available login and computute nodes have been updated to RHEL 9
- login.hpc.ncsu.edu now goes to RHEL 9 login nodes
- RHEL 9.2 HPC login node image available via VCL
- New GPU nodes available from gpu LSF queue
- Nvidia L40 and H100 GPUs (from NSF CC* grant award)
- All GPU nodes are now running Red Hat Enterprise Linux 9 and are available from 'gpu' queue
Note Regarding Application Compatibility with RHEL 9
If you encounter compatibility issues with an application running on RHEL 9 and do not have access to source code to recompile the application, here are some potential solutions:
Note Regarding Message Passing Interconnects
With the RHEL 9 update a number of older Flex node InfiniBand interfaces
were no longer supported. All Flex nodes are now using 10Gbps Ethernet for message passing. These nodes do not have
connections between chassis and are primarily accessible from the
single_chassis queue. These Flex nodes have an LSF resource of e10g
assigned to them.
Newer nodes with InfiniBand interfaces have an LSF resource 'ib' assigned.
Work that Remains In-Progress
- Update videos to reflect new cluster name and default shell
Significant Changes between Hazel and Henry2
- Running RHEL 9.2 - Henry2 had been running CentOS 7.5
- Running LSF 10.1.0.13 (has been updated to 10.1.0.14) - Henry2 had been running 10.1.0.10 [this is actually a significant difference as IBM includes significant feature changes in the minor release levels]
- Default shell is bash - Henry2 had been tcsh
- Began operation with significantly fewer cores than Henry2 (which had about 10,000 cores)
- Operation began Jan 9 2023 with about 3000 cores
- Added about 3800 cores from Henry2 nodes (what had been racks 2b and 2l)
- Added about 1600 cores with new partner nodes
- Added about 2000 Flex Chassis node cores from Henry2
- Some additional Flex Chassis nodes may be added - however rack and cooling capacity is being consumed largely by new partner nodes