-
6-7 February 2021 - Data Center Network Maintenance
ComTech will be performing network maintenance in the data centers this weekend. This will cause interruptions in communications between Henry 2 nodes for job management and storage access. To help minimize potential job issues LSF will be set to stop scheduling jobs Friday February 5 and scheduling will resume when the maintenance is completed.
-
18 January 2021 - LSF Patching
LSF will be patched to version 10.1.11 starting about 9am Monday January 19. There may be interruptions in LSF availability and new job scheduling during the day Monday as patches are installed and LSF is restarted.
-
18 November 2020 - Updated default modules
The following modules were updated:
cmake to 3.19.0
conda to 4.9.2
julia to 1.5.3
-
07 November 2020 - Data Center Network Maintenance
ComTech will be performing network maintenance in the data centers this weekend. This will cause some periods that the Henry2 nodes will loose file system access. To help minimize potential job issues LSF will be set to stop scheduling jobs Friday Nov 6 and scheduling will resume when the maintenance is completed on Sunday Nov 8.
-
20 September 2020 - Software and module updates
Default versions of these modules have been updated to:
CMake 3.19.2
Julia 1.5.1
Conda 4.8.5
Python 3.7.7
-
14 July 2020 - Update default module for R
The default module for R will be changed to R 4.0.2 on August 10. The older module will still be available, but the full path will need to be specified when using the module for R 3.5.1. Please contact the help desk if you would like assistance in porting your R libraries to R 4.0.2.
-
8 July 2020 - BLAST+ update
The default BLAST+ module has been updated to version 2.10.1.
-
3 May 2020 - Python modules have been removed
The Python modules are deprecated and have been removed.
Users should do 'module load conda' to use Python 3
and use Conda to install custom Python environments.
-
28 April 2020 - rtx2080 GPU node is up.
-
25 April 2020 - rtx2080 GPU node is down for repairs.
-
25 April 2020 - Extended Maintenance Weekend
April 25-26 there will be an extended maintenance window for the data center where Henry2 cluster is located. During this maintenance updates will be applied to the core Ethernet switches for the cluster. These updates will cause interruptions in communications between nodes for job management and storage access. If possible should avoid running jobs during this maintenance.
-
21 April 2020 - rtx2080, gtx1080, p100 GPU nodes up
These 3 nodes are up with updated drivers, 440.+.
cuda-10.2 toolkit also installed.
module load cuda/10.2
-
13 April 2020 - Python modules will be removed
The Python modules are deprecated and will be removed on May 1st, 2020.
Users should do 'module load conda' to use Python 3:
https://hpc.ncsu.edu/Software/Apps.php?app=Python
and use Conda to install custom Python environments:
https://hpc.ncsu.edu/Software/Apps.php?app=Conda
-
12 March 2020 - LSF on Henry2
LSF is now operating normally.
The process of bringing the remaining powered-off compute nodes online will continue in the next couple of days.
-
11 March 2020 - LSF on Henry2
As part of ongoing efforts to recover the Henry2 cluster to normal operating state following the cooling outage, LSF is being shut down on all nodes. It may take a few hours to complete this work. Once complete LSF will be gradually restarted, and hopefully will return to a stable operating state.
-
10 March 2020 - Cooling issue in the Data Center
Approximately from 0:15am until 2:15am the Data Center has lost its proper cooling.
As a preventive measure to keep ambient temperature in the Data Center from raising, during that time the compute nodes not running any jobs were shut down.
They will be brought back online gradually, but the amount of available compute nodes for new jobs will be limited during that process. Therefore the pending status of some jobs could be longer than usual.
-
1-2 February 2020 - OIT quarterly extended maintenance
During this maintenance some network configuration changes to HPC network will be applied, which might cause very short network disruption. It shouldn't, but users should be aware of this possibility.
-
27 January 2020 - /ncsu/volume1 maintenance
/ncsu/volume1 will be taken off-line to perform upgrade of underlying file system to a newer version. It will not be available approximately from 8am until 9am.
-
25 January 2020 - Software and module updates
R 3.6.2 is installed, but 3.5.1 remains the default module.
CMake 3.16.3 is installed and is the new default module.
Julia 1.3.1 is installed and is the new default module.
BLAST+ 2.10.0 was installed and is the new default module.
Parallel NetCDF for Intel 2017 was installed.
-
22-January-2020 - Henry2 Network Interruption
A network configuration change morning of 22 January had unintended side effects. The change was reversed and HPC network and cluster recovered with few minute outage. It's possible that some running jobs were impacted if they attempted to do file IO during the network outage.
-
3 October 2019 - cmake update and mpi module rename
The new default module for cmake is version 3.15.4.
The module mpi/gcc_openmpi has been removed. The following will give the same environment:
module load openmpi-gcc/openmpi1.8.4-gcc4.8.2
-
23 September 2019 - Change in requesting memory resources
Previously memory resource requests such as
-R "rusage[mem=500]"
applied per task
This type of memory resource request now applies per HOST - so if previously requested mem=500 and n=4, with span[ptile=4], now would request mem=2000 to reserve the same amount of memory on the host.
-
23 September 2019 - New bsub GPU syntax
Following the recent LSF updates there is a new bsub syntax for using GPUs. Following is a suggested option to request one GPU per host assigned to a job
-gpu "num=1:mode=exclusive_process:mps=yes”
or in script
#BSUB -gpu "num=1:mode=exclusive_process:mps=yes”
-
23 September 2019 - GPU usage Update
Access to the GPUs has been interrupted due to implementing the new syntax of the updated LSF scheduler. We are working to resolve this problem as soon as possible.
-
15 September 2019 - LSF Update
8 September LSF update did not complete. Currently at 10.1.0.7 - will apply patch to bring version to 10.1.0.8 on Sunday 15 September. Anticipate LSF may be unavailable for much of the morning.
-
08 September 2019 - LSF Update
LSF will be updated to latest patch for version 10.1 - there may be brief periods where 'not responding' messages will display when using LSF commands while the updates are being applied. No impact is anticipated for running jobs. However, slight delays in starting new jobs may occur as the system is restarted.
-
24 June 2019 - jhl* compute nodes network disruption
Due to scheduled maintenance on network equipment at the James B. Hunt Library, these nodes will briefly loose their network connection between 6am and 7am. It's advisable not to schedule any jobs for them during that period.
-
10 April 2019 - CLC Genomics Server Update
Starting 10 April 2019 the CLC Genomics Server on HPC Cluster will be unavailable to allow it to be updated to the current version. The server will be available again starting 15 April 2019. Client software will need to be updated to current version to be compatible with the server after the update.
-
22 March 2019 - compute nodes jhl025-jhl028 unavailable
Due to required maintenance on these nodes at the James B. Hunt Library, they'll be taken off-line. They will be re-opened in LSF once the maintenance is complete.
-
3 February 2019 7pm - /gpfs_share partition is back on-line
-
3 February 2019 10am - /gpfs_partners partition is back on-line
-
1 February 2019 7pm - /gpfs_share and /gpfs_partners partitions are off-line
The servers providing these partitions experience significant slowness. To investigate the cause the partitions were temporarily taken off-line. The updates will be posted when they become available.
-
22 January 2019 - NFS exports unavailable & VCL-HPC reservations disabled
1. Due to the upgrade of the OS and GPFS software on the server providing variuos NFS exports, they will be not available for a duration of this maintenance.
2. VCL-HPC production image will be switched to the new version - "HPC (CentOS 7.5 64 bit VM)".
The expected duration of this maintenance is 2 hours (8am-10am).
-
02-03 January 2019 - jhl* compute nodes unavailable
Due to the scheduled outage at the James B. Hunt Library, jhl* compute nodes will be unavailable during this period. As a preparation for this maintenance these nodes were closed in LSF for future jobs on 12/30/2018. They will be re-opened once the maintenance is complete.
-
01 December 2018 - Gurobi License
A floating Gurobi license is now available on Henry2. Also Gurobi 8.1.0 was installed. Use command module load gurobi to set up environment to use Gurobi 8.1.9
-
30 November - 3 December 2018 - /rsstu is not available on the HPC cluster
There were errors on a few disks in the storage unit providing /rsstu partition. As a result it was necessary to temporarily un-mount it on HPC cluster pending investigation by the vendor on Monday (12/03/2018). The updates will be posted when they become available.
-
19 October 2018 - Henry2 OS upgrade
For a next couple of months there will be a gradual OS upgrade to version 7.5 on compute and login nodes. This process should have minimal impact on the general availability of the cluster resources. If there are any issues, please, report it via creating NCSU Service Desk incident for OIT_HPC.
-
22 September 2018 - Henry2 core Ethernet switch maintenance
ComTech will be upgrading firmware on core Ethernet switches used by Henry2 cluster during the extended datacenter maintenance scheduled for September 22 and 23. Each switch will experience a few minute outage will new firmware is installed. These switches are used for storage access and job management.
-
30 July 2018 - Monthly Maintenance on login nodes
The login nodes will be taken off-line to apply OS security updates at the beginning of each month.
Exact dates and times for each login node will be posted in MOTD (Message Of The Day), displayed on the screen upon logging in to relevant login node.
-
9 July 2018 - Maintenance on login01,login02,login03[.hpc.ncsu.edu]
These machines will be taken off-line to apply OS security updates at these times:
10am - login01.hpc.ncsu.edu
12pm - login02.hpc.ncsu.edu
2pm - login03.hpc.ncsu.edu
The expected duration of each maintenance is 2 hours.
There should be no impact to the usual work flow after update, but if there is an issue, please, report it via creating NCSU Service Desk incident for OIT_HPC.
UPDATE: The maintenance on login01 was completed at 11:00am.
UPDATE: The maintenance on login02 was completed at 12:40pm.
UPDATE: The maintenance on login03 was completed at 2:45pm.
-
2 July 2018 - Maintenance on login04.hpc.ncsu.edu
login04.hpc.ncsu.edu will be taken off-line starting 10am to apply OS security updates.
The expected duration of this maintenance is 2 hours.
There should be no impact to the usual work flow after update, but if there is an issue, please, report it via creating NCSU Service Desk incident for OIT_HPC.
UPDATE: The maintenance was completed at 11am.
-
27 June 2018 - Web Application Broken
Database for research computing web application was migrated to new platform this morning. The application was not functioning correctly from about 3am until about 9:15am
-
23 June 2018 - New Top Level HPC Web Pages
As an interim step toward a full redesign of the HPC web site several new upper level pages have been made active. The old site, in its entirety is still available by selecting the 'Legacy Web Site' button on the new main page.
-
1 June 2018 VMD 1.9.3
VMD version 1.9.3 has been installed on henry2 cluster. VCL HPC login node reservation should be used for running VMD [see news item below related to remote desktop connection with HPC-VCL]. Use following command to set up environment for VMD 1.9.3:
module load vmd/1.9.3
18 March 2018 New Portland Group Compiler Version
[edsills@login01 ~]$ module load PrgEnv-pgi
[edsills@login01 ~]$ module list
Currently Loaded Modulefiles:
1) pgi/18.1 3) openmpi/2.1.2/2018
2) netcdf/4.5.0/openmpi-2.1.2/2018 4) PrgEnv-pgi/18.1
[edsills@login01 ~]$ which pgf90
/usr/local/pgi/linux86-64/18.1/bin/pgf90
[edsills@login01 ~]$ which mpif77
/usr/local/pgi/linux86-64/2018/mpi/openmpi-2.1.2/bin/mpif77
Older versions can be selected by specifying
version explicitly.. eg
module load PrgEnv-pgi/16.7
Strongly recommend not to use any version older than 15.1
22 August 2017 Shared file system upgrade/change completed
We completed shared file system upgrade/change on 8/21/2017. The old /share, /share2, /share3 are moved to /gpfs_common/old_share, /gpfs_common/old_share2, /gpfs_common/old_share3, respectively, and the old data there will be preserved for 20 days. After that, the old data will be wiped out.
NOTE: You do not have read permission in /gpfs_common anymore and thus you cannot do ls there. To access your data on the old shared file system, you need to type the full path to your own subdirectory, such as /gpfs_common/old_share/your-user-name, when you do cd on login nodes or when you provide folder name in WinSCP.
The new shared file system you can access is
/share/your-group-name
where your-group-name is the first group when you type the command "groups". You can cd into /share/your-group-name and use the command
mkdir your-user-name
to create your own subdirectory, and store data and run jobs in your own subdirectory, where your-user-name is your HPC username.
Each group has a 10TB quota for their group directory /share/your-group-name. As before /share is not backed up and files that have not been recently accessed are automatically deleted (currently purge is set to remove files that have not been accessed for 30 days).