Be sure to use resources efficiently with hybrid OpenMP-MPI jobs.
Hazel is a heterogeneous cluster, consisting of many different node types. When specifiing ptile and OMP_NUM_THREADS, keep in mind that nodes have from 8 to 32 cores. The example script specifies an 8 core node, since it can only use 8 cores per node.
See the following video segment on finding specs for nodes on the cluster.
If the code works optimally when using the maximum amount of cores on the node in shared memory (OpenMP), then use ptile=1 and set OMP_NUM_THREADS to be all the cores on the node.
In the following script, 1 MPI task per node is specified. The command nproc --all gives the total number of cores on the compute node, and OMP_NUM_THREADS is set to that value. This ensures all the cores are used, regardless of node type.
#!/bin/bash #BSUB -n 6 # Number of MPI tasks #BSUB -R span[ptile=1] # MPI tasks per node #BSUB -x # Exclusive use of nodes #BSUB -J chemtest1 # Name of job #BSUB -W 2:30 # Wall clock time #BSUB -o chemtest1.out.%J # Standard out #BSUB -e chemtest1.err.%J # Standard error module load openmpi-gcc # Set environment export OMP_NUM_THREADS=`nproc --all` mpirun ./chemtest1.exe
If the code works optimally when using more MPI tasks, and a smaller number of threads per MPI task, then use environment variables to set OMP_NUM_THREADS to be a portion of all cores on the node.
In the following script, 2 MPI tasks per node are specified. Each of those two 2 MPI tasks can only use half of the cores on the compute node. The command nproc --all gives the total number of cores on the compute node, and the line of @ halfCores = `nproc --all` / 2 assigns the number of half the cores to the variable halfCores, which is then set as the value of OMP_NUM_THREADS. This ensures all the cores are used, regardless of node type.
#!/bin/bash #BSUB -n 6 # Number of MPI tasks #BSUB -R span[ptile=2] # MPI tasks per node #BSUB -x # Exclusive use of nodes #BSUB -J chemtest1 # Name of job #BSUB -W 2:30 # Wall clock time #BSUB -o chemtest1.out.%J # Standard out #BSUB -e chemtest1.err.%J # Standard error module load openmpi-gcc # Set environment halfCores=$((`nproc --all`/2)) export OMP_NUM_THREADS=$halfCores mpirun ./chemtest1.exe
If the code scales to a large number of cores (many nodes), this script allows the user to drop the restrition of exclusive use of the nodes (a job that requires many exclusive nodes may stay PENDing for a long time). OMP_NUM_THREADS is controlled by the ptile scheduler option.
In the following script, a total of 80 threads are run. The scheduler reserves 8 cores on each of 10 nodes. One MPI task will run on each node. Normally #BSUB -n 80 means a total of 80 MPI tasks, but in this case, the default -n parameter for mpirun is overridden (see below, mpirun -n $numNodes). In summary, a total of 80 threads are run: 10 nodes, 1 MPI task per node, 8 threads for each MPI task. The number of threads (OMP_NUM_THREADS) run on each node is limited by ptile. This control allows for dropping the exclusivity restriction.
#!/bin/bash #BSUB -n 80 # Normally this is the # of MPI tasks, but here -n/ptile is the # of MPI tasks - note the special mpirun arguments below #BSUB -R span[ptile=8] # Threads per node, the -n value above should be divisible by the ptile value #BSUB -J chemtest1 # Name of job #BSUB -W 2:30 # Wall clock time #BSUB -o chemtest1.out.%J # Standard out #BSUB -e chemtest1.err.%J # Standard error module load PrgEnv-intel/2020.2.254 # Set environment export Num=$((`echo "$LSB_SUB_RES_REQ" | awk -F 'ptile=' '{print $2}' | awk -F ']' '{print $1}'`)) echo "$LSB_SUB_RES_REQ" echo "$Num" export OMP_NUM_THREADS=$Num # Set number of threads per node to ptile echo "$OMP_NUM_THREADS" numNodes=$(($LSB_DJOB_NUMPROC/$OMP_NUM_THREADS)) echo $numNodes cat $LSB_DJOB_HOSTFILE | uniq | sed 's/$/:1/' > mf # Create a machinefile called mf export cmdstring="mpirun -n $numNodes -machinefile mf ./chemtest1.exe" $cmdstring