CUDA
CUDA is a parallel computing architecture from NVIDIA. It is used to program instructions for GPUs.
External Links:
CUDA toolkit and driver compatibility table
CUDA toolkit website
NVIDIA Easy Introduction to CUDA
How to run on GPUs in the gpu queues
All GPU nodes are now running Red Hat Enterprise Linux 9. Following is an example job script to request use of four A100 GPUs:
#!/bin/bash #BSUB -n 1 #BSUB -W 30 #BSUB -q gpu #BSUB -R "select[a100]" #BSUB -gpu "num=4:mode=shared:mps=yes" #BSUB -o out.%J #BSUB -e err.%J nvidia-smi
Quick test of GPU availability
lsload -gpuload
Loading CUDA
There are various versions of CUDA on Hazel. To see the various versions available, type
module avail cuda
and
ls /usr/local/apps/cuda/*.
To set the environment, either source the appropriate script or load the default module
module load cudaLoading the module cuda will put the CUDA compiler nvcc in the path, as well as setting the path to the CUDA libraries.
Exclusive use of the GPUs
-
To use a GPU exclusively but allow other jobs to share the other GPUs on the node, use
-gpu "mps=yes:mode=exclusive_process"
With this setting, your job will use the GPU exclusively. Your other jobs and other user's jobs will be able to share the same node, but not that/those GPUs. Other jobs (yours and others') will be able to use the other free GPUs on that node. For example, the setting-gpu "num=1:mps=yes:mode=exclusive_process"will allow you to run a total of 4 GPU jobs on a node with 4 GPUs. The setting-gpu "num=4:mps=yes:mode=exclusive_process", on a node with 4 GPUs, will not allow any other jobs to run, because all 4 GPUs will be used exclusively. -
To use a GPU and allow sharing with your and others' jobs, use
-gpu "mps=yes:mode=shared"
With mps on (
mps=yes:mode=exclusive_process), the NVIDIA MPS architecture is designed to allow a user to run many jobs using the same mps server, even though the mode is set to exclusive_process. However, our LSF is not set up to allow that, effectively meaning that ifmode=exclusive_process, a job will use the GPU exclusively, and not share with other jobs from the same user.This means that
mps=yes:mode=exclusive_processandmps=no:mode=exclusive_processoperate the same, where sharing is concerned. -
To use a GPU node exclusively:
Note that users would rarely need to do this, and should not use this capability without serious consideration. Use
#BSUB -x
if the queue allows this. If not, for a 4 GPU node, use
-gpu "num=4:mps=yes:mode=exclusive_process"
or
-gpu "num=4:mps=no:mode=exclusive_process"
How to compile with the correct CUDA version on Hazel
What follows is two approaches to compiling and running code on the GPUs. They are:[1] (a) to first install/compile the application according to the application's documentation, then (b) reserve suitable resources to run them.
[2] to compile/install your code to target certain GPU hardware on Hazel.
Method [1]
Most users will use this method. The application's documentation will specify which version of cuda, and which compute capability (cc) the code should be compiled with.
module avail cudashows all of the cuda toolkit packages available. These should cover any application. So for example, if the application requires cuda toolkit 10.1, then
module load cuda/10.1will prepare the environment variables so that when you compile your code, the appropriate nvcc, cuda libraries and cuda include files can be found.
Next, running your code: Having compiled with a certain toolkit, and a certain cc, then find the range of the drivers and hardware that will support that toolkit and cc, by looking at these two tables:
CUDA toolkit - driver compatibility table and cc - driver compatibility table
For example, CUDA 10.1 requires a driver >=418.39 (as seen from the 1st linked table above). In the table below, you will see that the rtx2080 GPU node is able to support this application (because the driver installed is 418.74). Next, check the cc: Suppose you compiled your code with cc 7.5. The table below shows that the rtx2080 node can support this cc. Therefore, to run this code, you have to target this node with a batch script like:
#!/bin/bash #BSUB -n 1 #BSUB -W 30 #BSUB -q gpu #BSUB -R "select[rtx2080]" #BSUB -gpu "num=1:mode=shared:mps=yes" #BSUB -o out.%J #BSUB -e err.%J module load PrgEnv-pgi module load cuda/10.1 ./nnetworks.exe
Method [2]
Some users may want to target certain GPUs. For example, suppose a user wants to take advantage of the older GPUs. First, look at the table in (3) below to see the cc and drivers for these nodes - they are cc = 6.0, and the driver is 525.60.13. Then look here, CUDA toolkit - driver compatibility table, to see what cuda toolkit should be used. This shows that CUDA 12.x will work. So when preparing the environment variable for compiling, use:
module load cuda/12.0since that is available on our system. Also make sure that the code is compiled with cc = 6.0.
After compilation, run the code to target the intended resources with a batch script that might look like:
#!/bin/bash #BSUB -n 1 #BSUB -W 30 #BSUB -q gpu #BSUB -R "select[gtx1080]" #BSUB -gpu "num=1:mode=shared:mps=yes" #BSUB -o out.%J #BSUB -e err.%J module load PrgEnv-pgi module load cuda/12.0 ./nnetworks.exe
List of GPU nodes, their compute capability(cc) and GPU drivers
This information can be obtained with
lshosts -gpuResource type Description cc Driver
(NVIDIA)
a100 Node with A 100 GPUs 8.0 535.86.10
a30 Node with A 30 GPUs 8.0 535.86.10
a10 Node with A 10 GPUs 8.0 535.86.10
rtx2080 Node with RTX 2080 GPUs 7.5 525.60.13
gtx1080 Node with GTX 1080 GPUs 6.1 525.60.13
p100 Node with P100 GPUs 6.0 525.60.13
Example codes
Use of CUDA on the GPUs is demonstrated with the following example code that adds two vectors.
CUDA C/C++ Example:
ReadMe
C/C++ Makefile
vectorAdd.cu
CUDA for Fortran Example:
ReadMe
Fortran Makefile
Fortran file
Cuda File
Last modified: January 16 2025 10:02:25.