module load cuda
Loading the module cuda will put the CUDA compiler
nvcc in the path, as well as setting the path to the CUDA libraries.
How to compile with the correct CUDA version on Henry2
What follows is two approaches to compiling and running code on the GPUs. They are:
[1] (a) to first install/compile the application according to the application's documentation, then (b) reserve suitable resources to run them.
[2] to compile/install your code to target certain GPU hardware on Henry2.
Method [1]
Most users will use this method. The application's documentation will specify which version of cuda, and which compute capability (cc) the code should be compiled with.
module avail cuda
shows all of the cuda toolkit packages available. These should cover any application. So for example, if the application requires cuda toolkit 10.1, then
module load cuda/10.1
will prepare the environment variables so that when you compile your code, the appropriate nvcc, cuda libraries and cuda include files can be found.
Next, running your code: Having compiled with a certain toolkit, and a certain cc, then find the range of the drivers and hardware that will support that toolkit and cc, by looking at these two tables:
CUDA toolkit - driver compatibility table and
cc - driver compatibility table
For example, CUDA 10.1 requires a driver >=418.39 (as seen from the 1st linked table above). In the table below, you will see that the rtx2080 GPU node is able to support this application (because the driver installed is 418.74). Next, check the cc: Suppose you compiled your code with cc 7.5. The table below shows that the rtx2080 node can support this cc. Therefore, to run this code, you have to target this node with a batch script like:
#!/bin/tcsh
#BSUB -n 1
#BSUB -W 30
#BSUB -q gpu
#BSUB -R "select[rtx2080]"
#BSUB -gpu "num=1:mode=shared:mps=yes"
#BSUB -o out.%J
#BSUB -e err.%J
module load PrgEnv-pgi
module load cuda/10.1
./nnetworks.exe
Note that mps=yes works for the newer GPUs (rtx2080, gtx1080, p100).
Method [2]
Some users may want to target certain GPUs. For example, suppose a user wants to take advantage of the older GPUs, because there are so many of them (like the m2070q and the m2090, for example). First, look at the table in (3) below to see the cc and drivers for these nodes - they are cc = 2.0, and the driver is 390.67. Then look here,
CUDA toolkit - driver compatibility table, to see what cuda toolkit should be used. This shows that CUDA 9.0 or 9.1 are the two most recent that will work. So when preparing the environment variable for compiling, use:
module load cuda/9.0
since that is available on our system.
Also make sure that the code is compiled with cc = 2.0.
After compilation, run the code to target the intended resources with a batch script that might look like:
#!/bin/tcsh
#BSUB -n 1
#BSUB -W 30
#BSUB -q gpu
#BSUB -R "select[m2070q || m2090]"
#BSUB -gpu "num=1:mode=shared:mps=no"
#BSUB -o out.%J
#BSUB -e err.%J
module load PrgEnv-pgi
module load cuda/9.0
./nnetworks.exe
Note that mps=no for the older GPUs, as they can't support NVIDIA's multiple process server.
List of GPU nodes, their compute capability(cc) and GPU drivers
This information can be obtained with
lshosts -gpu
Resource type Description cc Driver
(NVIDIA)
rtx2080 Node with RTX 2080 GPUs 7.5 418.74
gtx1080 Node with GTX 1080 GPUs 6.1 390.67
p100 Node with P100 GPUs 6.0 396.26
k20m Node with K20m GPUs 3.5 390.67
m2070 Node with M2070 GPUs 2.0 390.67
m2070q Node with M2070Q GPUs 2.0 390.67
m2090 Node with M2090 GPUs 2.0 390.67
Example codes
Use of CUDA on the GPUs is demonstrated with the following example code that adds two vectors.
CUDA C/C++ Example:
ReadMe
C/C++ Makefile
vectorAdd.cu
CUDA for Fortran Example:
ReadMe
Fortran Makefile
Fortran file
Cuda File
Last modified: September 30 2020 18:01:44.