Archived HPC News

HPC News is now being hosted on the IT Service Portal.
  • 6-7 February 2021 - Data Center Network Maintenance

      ComTech will be performing network maintenance in the data centers this weekend. This will cause interruptions in communications between Henry 2 nodes for job management and storage access. To help minimize potential job issues LSF will be set to stop scheduling jobs Friday February 5 and scheduling will resume when the maintenance is completed.
  • 18 January 2021 - LSF Patching

      LSF will be patched to version 10.1.11 starting about 9am Monday January 19. There may be interruptions in LSF availability and new job scheduling during the day Monday as patches are installed and LSF is restarted.
  • 18 November 2020 - Updated default modules

      The following modules were updated:
      cmake to 3.19.0
      conda to 4.9.2
      julia to 1.5.3
  • 07 November 2020 - Data Center Network Maintenance

      ComTech will be performing network maintenance in the data centers this weekend. This will cause some periods that the Henry2 nodes will loose file system access. To help minimize potential job issues LSF will be set to stop scheduling jobs Friday Nov 6 and scheduling will resume when the maintenance is completed on Sunday Nov 8.
  • 20 September 2020 - Software and module updates

      Default versions of these modules have been updated to:
      CMake 3.19.2
      Julia 1.5.1
      Conda 4.8.5
      Python 3.7.7
  • 14 July 2020 - Update default module for R

      The default module for R will be changed to R 4.0.2 on August 10. The older module will still be available, but the full path will need to be specified when using the module for R 3.5.1. Please contact the help desk if you would like assistance in porting your R libraries to R 4.0.2.
  • 8 July 2020 - BLAST+ update

      The default BLAST+ module has been updated to version 2.10.1.
  • 3 May 2020 - Python modules have been removed

      The Python modules are deprecated and have been removed. Users should do 'module load conda' to use Python 3 and use Conda to install custom Python environments.
  • 28 April 2020 - rtx2080 GPU node is up.

  • 25 April 2020 - rtx2080 GPU node is down for repairs.

  • 25 April 2020 - Extended Maintenance Weekend

      April 25-26 there will be an extended maintenance window for the data center where Henry2 cluster is located. During this maintenance updates will be applied to the core Ethernet switches for the cluster. These updates will cause interruptions in communications between nodes for job management and storage access. If possible should avoid running jobs during this maintenance.
  • 21 April 2020 - rtx2080, gtx1080, p100 GPU nodes up

      These 3 nodes are up with updated drivers, 440.+.
      cuda-10.2 toolkit also installed.
      module load cuda/10.2
  • 13 April 2020 - Python modules will be removed

      The Python modules are deprecated and will be removed on May 1st, 2020.
      Users should do 'module load conda' to use Python 3:
      https://hpc.ncsu.edu/Software/Apps.php?app=Python
      and use Conda to install custom Python environments:
      https://hpc.ncsu.edu/Software/Apps.php?app=Conda
  • 12 March 2020 - LSF on Henry2

      LSF is now operating normally.
      The process of bringing the remaining powered-off compute nodes online will continue in the next couple of days.
  • 11 March 2020 - LSF on Henry2

      As part of ongoing efforts to recover the Henry2 cluster to normal operating state following the cooling outage, LSF is being shut down on all nodes. It may take a few hours to complete this work. Once complete LSF will be gradually restarted, and hopefully will return to a stable operating state.
  • 10 March 2020 - Cooling issue in the Data Center

      Approximately from 0:15am until 2:15am the Data Center has lost its proper cooling.
      As a preventive measure to keep ambient temperature in the Data Center from raising, during that time the compute nodes not running any jobs were shut down.
      They will be brought back online gradually, but the amount of available compute nodes for new jobs will be limited during that process. Therefore the pending status of some jobs could be longer than usual.
  • 1-2 February 2020 - OIT quarterly extended maintenance

      During this maintenance some network configuration changes to HPC network will be applied, which might cause very short network disruption. It shouldn't, but users should be aware of this possibility.
  • 27 January 2020 - /ncsu/volume1 maintenance

      /ncsu/volume1 will be taken off-line to perform upgrade of underlying file system to a newer version. It will not be available approximately from 8am until 9am.
  • 25 January 2020 - Software and module updates

      R 3.6.2 is installed, but 3.5.1 remains the default module.
      CMake 3.16.3 is installed and is the new default module.
      Julia 1.3.1 is installed and is the new default module.
      BLAST+ 2.10.0 was installed and is the new default module.
      Parallel NetCDF for Intel 2017 was installed.
  • 22-January-2020 - Henry2 Network Interruption

      A network configuration change morning of 22 January had unintended side effects. The change was reversed and HPC network and cluster recovered with few minute outage. It's possible that some running jobs were impacted if they attempted to do file IO during the network outage.
  • 3 October 2019 - cmake update and mpi module rename

      The new default module for cmake is version 3.15.4.

      The module mpi/gcc_openmpi has been removed. The following will give the same environment:
      module load openmpi-gcc/openmpi1.8.4-gcc4.8.2
  • 23 September 2019 - Change in requesting memory resources

      Previously memory resource requests such as -R "rusage[mem=500]" applied per task This type of memory resource request now applies per HOST - so if previously requested mem=500 and n=4, with span[ptile=4], now would request mem=2000 to reserve the same amount of memory on the host.
  • 23 September 2019 - New bsub GPU syntax

      Following the recent LSF updates there is a new bsub syntax for using GPUs. Following is a suggested option to request one GPU per host assigned to a job -gpu "num=1:mode=exclusive_process:mps=yes” or in script #BSUB -gpu "num=1:mode=exclusive_process:mps=yes”
  • 23 September 2019 - GPU usage Update

      Access to the GPUs has been interrupted due to implementing the new syntax of the updated LSF scheduler. We are working to resolve this problem as soon as possible.
  • 15 September 2019 - LSF Update

      8 September LSF update did not complete. Currently at 10.1.0.7 - will apply patch to bring version to 10.1.0.8 on Sunday 15 September. Anticipate LSF may be unavailable for much of the morning.
  • 08 September 2019 - LSF Update

      LSF will be updated to latest patch for version 10.1 - there may be brief periods where 'not responding' messages will display when using LSF commands while the updates are being applied. No impact is anticipated for running jobs. However, slight delays in starting new jobs may occur as the system is restarted.
  • 24 June 2019 - jhl* compute nodes network disruption

      Due to scheduled maintenance on network equipment at the James B. Hunt Library, these nodes will briefly loose their network connection between 6am and 7am. It's advisable not to schedule any jobs for them during that period.
  • 10 April 2019 - CLC Genomics Server Update

      Starting 10 April 2019 the CLC Genomics Server on HPC Cluster will be unavailable to allow it to be updated to the current version. The server will be available again starting 15 April 2019. Client software will need to be updated to current version to be compatible with the server after the update.
  • 22 March 2019 - compute nodes jhl025-jhl028 unavailable

      Due to required maintenance on these nodes at the James B. Hunt Library, they'll be taken off-line. They will be re-opened in LSF once the maintenance is complete.
  • 3 February 2019 7pm - /gpfs_share partition is back on-line

    • 3 February 2019 10am - /gpfs_partners partition is back on-line

      • 1 February 2019 7pm - /gpfs_share and /gpfs_partners partitions are off-line

          The servers providing these partitions experience significant slowness. To investigate the cause the partitions were temporarily taken off-line. The updates will be posted when they become available.
      • 22 January 2019 - NFS exports unavailable & VCL-HPC reservations disabled

          1. Due to the upgrade of the OS and GPFS software on the server providing variuos NFS exports, they will be not available for a duration of this maintenance.
          2. VCL-HPC production image will be switched to the new version - "HPC (CentOS 7.5 64 bit VM)".
          The expected duration of this maintenance is 2 hours (8am-10am).
      • 02-03 January 2019 - jhl* compute nodes unavailable

          Due to the scheduled outage at the James B. Hunt Library, jhl* compute nodes will be unavailable during this period. As a preparation for this maintenance these nodes were closed in LSF for future jobs on 12/30/2018. They will be re-opened once the maintenance is complete.
      • 01 December 2018 - Gurobi License

          A floating Gurobi license is now available on Henry2. Also Gurobi 8.1.0 was installed. Use command module load gurobi to set up environment to use Gurobi 8.1.9
      • 30 November - 3 December 2018 - /rsstu is not available on the HPC cluster

          There were errors on a few disks in the storage unit providing /rsstu partition. As a result it was necessary to temporarily un-mount it on HPC cluster pending investigation by the vendor on Monday (12/03/2018). The updates will be posted when they become available.
      • 19 October 2018 - Henry2 OS upgrade

          For a next couple of months there will be a gradual OS upgrade to version 7.5 on compute and login nodes. This process should have minimal impact on the general availability of the cluster resources. If there are any issues, please, report it via creating NCSU Service Desk incident for OIT_HPC.
      • 22 September 2018 - Henry2 core Ethernet switch maintenance

          ComTech will be upgrading firmware on core Ethernet switches used by Henry2 cluster during the extended datacenter maintenance scheduled for September 22 and 23. Each switch will experience a few minute outage will new firmware is installed. These switches are used for storage access and job management.
      • 30 July 2018 - Monthly Maintenance on login nodes

          The login nodes will be taken off-line to apply OS security updates at the beginning of each month.
          Exact dates and times for each login node will be posted in MOTD (Message Of The Day), displayed on the screen upon logging in to relevant login node.
      • 9 July 2018 - Maintenance on login01,login02,login03[.hpc.ncsu.edu]

          These machines will be taken off-line to apply OS security updates at these times:
          10am - login01.hpc.ncsu.edu
          12pm - login02.hpc.ncsu.edu
          2pm - login03.hpc.ncsu.edu
          The expected duration of each maintenance is 2 hours.
          There should be no impact to the usual work flow after update, but if there is an issue, please, report it via creating NCSU Service Desk incident for OIT_HPC.
          UPDATE: The maintenance on login01 was completed at 11:00am.
          UPDATE: The maintenance on login02 was completed at 12:40pm.
          UPDATE: The maintenance on login03 was completed at 2:45pm.
      • 2 July 2018 - Maintenance on login04.hpc.ncsu.edu

          login04.hpc.ncsu.edu will be taken off-line starting 10am to apply OS security updates. The expected duration of this maintenance is 2 hours. There should be no impact to the usual work flow after update, but if there is an issue, please, report it via creating NCSU Service Desk incident for OIT_HPC.
          UPDATE: The maintenance was completed at 11am.
      • 27 June 2018 - Web Application Broken

          Database for research computing web application was migrated to new platform this morning. The application was not functioning correctly from about 3am until about 9:15am
      • 23 June 2018 - New Top Level HPC Web Pages

          As an interim step toward a full redesign of the HPC web site several new upper level pages have been made active. The old site, in its entirety is still available by selecting the 'Legacy Web Site' button on the new main page.
      • 1 June 2018 VMD 1.9.3

          VMD version 1.9.3 has been installed on henry2 cluster. VCL HPC login node reservation should be used for running VMD [see news item below related to remote desktop connection with HPC-VCL]. Use following command to set up environment for VMD 1.9.3:
          module load vmd/1.9.3
          
      • 1 June 2018 Amber 18

          Amber 18 is now available on henry2 cluster. It was built using Intel compiler, so Intel programing environment needs to be loaded in addition to environment for Amber 18. Use following commands to set up environment for Amber 18 (from either command line or batch script):
          module load PrgEnv-intel/2017.1.132
          module load amber/18
          
      • 1 June 2018 PGI 18.4

          PGI version 18.4 has been installed. Use following command to set up environment to use PGI 18.4 programming environment:
          module load PrgEnv-pgi
          
          Older versions can be selected by specifying version explicitly.. eg
          module load PrgEnv-pgi/18.1
          
      • 9 May 2018 Remote Desktop connection with HPC-VCL

          It is now possible to have a Linux Desktop environment on the HPC with the HPC-VCL login node.
          Learn more
      • 9 April 2018 /share, /gpfs_common, /gpfs_backup

          Starting about 8am these file systems will be unavailable to allow for a physical repair to be done to their storage array.
      • 22 March 2018 New Abaqus Version

          Abaqus version 2018 has been installed.

          module load abaqus

          will set up environment to use Abaqus 2018. Run command abaqus to invoke Abaqus.

          Previous versions can be accessed using

          module load abaqus/2016

          or

          module load abaqus/6.13-2

          and target for abaqus command will be adjusted appropriately

      • 18 March 2018 New Portland Group Compiler Version

          PGI version 18.1 has been installed.

          module load PrgEnv-pgi

          will set up environment to use the new version.

          [edsills@login01 ~]$ module load PrgEnv-pgi
          [edsills@login01 ~]$ module list
          Currently Loaded Modulefiles:
           1) pgi/18.1                          3) openmpi/2.1.2/2018
           2) netcdf/4.5.0/openmpi-2.1.2/2018   4) PrgEnv-pgi/18.1
          [edsills@login01 ~]$ which pgf90
          /usr/local/pgi/linux86-64/18.1/bin/pgf90
          [edsills@login01 ~]$ which mpif77
          /usr/local/pgi/linux86-64/2018/mpi/openmpi-2.1.2/bin/mpif77
          

          Older versions can be selected by specifying version explicitly.. eg

          module load PrgEnv-pgi/16.7

          Strongly recommend not to use any version older than 15.1

      • 03-04 March 2018 Network Switch Reboot

          During the extended data center maintenance scheduled for 03-04 March 2018 the ComTech switches that provide core network for the henry2 cluster will be rebooted to update software to latest version for compliance with security standards. The reboots will interrupt network connections between henry2 nodes and between nodes and storage. Running jobs attempting to communicate with other nodes or attempting to access storage during these events will most likely fail.

          Please plan job submissions to avoid having critical jobs running this weekend.

      • 04 January 2018 LSF Update

          Saturday 04 Jan 2018 the version of LSF on henry2 cluster will be updated from 8.3 to 10.1

          There should be no impact to running jobs. New job submissions and new jobs starting will be temporarily disabled while the upgrade is in progress. Expect that interruption to new jobs will be less than 4 hours.

      • 01 December 2017 New LSF Resource Definitions

          The following new LSF resources have been defined:
          • sse
          • sse2
          • ssse3
          • sse4_1
          • sse4_2
          • avx
          • avx2
          These correspond to the various vector instruction sets supported on Intel Xeon processors. These can be used in the bsub resource string to specify that a job be scheduled on node(s) whose processors support the specified instruction set.
      • 26 October 2017 Gaussian 16.a03 Installed

          Gaussian 16 and associated GaussView have been installed on henry2 cluster. Use add g16 command to configure shell environment to use these new versions.
      • 22 August 2017 Shared file system upgrade/change completed

          We completed shared file system upgrade/change on 8/21/2017. The old /share, /share2, /share3 are moved to /gpfs_common/old_share, /gpfs_common/old_share2, /gpfs_common/old_share3, respectively, and the old data there will be preserved for 20 days. After that, the old data will be wiped out.

          NOTE: You do not have read permission in /gpfs_common anymore and thus you cannot do ls there. To access your data on the old shared file system, you need to type the full path to your own subdirectory, such as /gpfs_common/old_share/your-user-name, when you do cd on login nodes or when you provide folder name in WinSCP.

          The new shared file system you can access is

          /share/your-group-name

          where your-group-name is the first group when you type the command "groups". You can cd into /share/your-group-name and use the command

          mkdir your-user-name

          to create your own subdirectory, and store data and run jobs in your own subdirectory, where your-user-name is your HPC username.

          Each group has a 10TB quota for their group directory /share/your-group-name. As before /share is not backed up and files that have not been recently accessed are automatically deleted (currently purge is set to remove files that have not been accessed for 30 days).

      • 21 August 2017 Henry2 Cluster Unavailable

          Cluster will be unavailable midnight-noon. A failing Ethernet switch will be replaced. The switch replacement will interrupt connections to storage for login nodes and many compute nodes.

          Since many jobs would be impacted anyway, the outage will also be used to move /gpfs_common and /gpfs_backup to new hardware. Therefore the originally announced hour maintenance is being extended to be 12 hours, but will avoid need for another interruption to running jobs in near future.

          A new organization of /share, /share1, and /share2 will be implemented on the new storage that will provide additional scratch quota to all HPC projects.

          Job scripts will need to be changed to reflect the new directory structure.

      • 11 August 2017 Henry2 /home quotas

          Quotas on /home had not been enforced since July 5th move of /home to new storage hardware. As of this morning quotas are again being enforced on /home. Also, the quota command is working normally again.
      • 5 July 2017 Henry2 Cluster Unavailable

          Cluster will be unavailable starting approximately 8am July 5 to allow /home and /usr/local file systems to be moved to new storage hardware. Cluster is expected to be available again late day July 5th.

          Jobs running at 8am will be lost. Queues will be disabled over July 4th holiday to minimize number of lost jobs.

      • 27 June 2017 henry2 /home and /usr/local

          /home and /usr/local file systems on henry2 cluster were offline due to partial power outage in data center overnight (about 9pm). File systems were recovered and back online around 9am. Work continues restarting compute nodes that did not come back online cleanly following the power outage.
      • 23 April 2017 Network maintenance

          ComTech will be performing maintenance on the network switches which form the core for the HPC cluster starting at 1pm and expected to take approximately 3 hours. During this time various parts of the cluster will be impacted as each of the 6 switches are updated.

          Queues will be paused Friday April 21 around 5pm to reduce number of jobs running Sunday afternoon. Jobs running during the Sunday maintenance tryring to access storage will likely fail.

      • 6 April 2017 henry2 logins

          Login attempts to henry2 cluster are currently failing. Fiber channel switch failed resulting in loss of connection to /home and /usr/local. Switch has been replaced and file systems are available again. Login authentication has returned ot normal.
      Copyright © 2024 · Office of Information Technology · NC State University · Raleigh, NC 27695 · Accessibility · Privacy · University Policies