A parallel job uses more than one core. The example program in R was a simple program that created a pdf file. It is a serial program, meaning it can only be run on one core. The program cannot be run on multiple cores, because nowhere in the program does it specify how to do so. A program can only do what the programmer tells it to do. In this case, we wrote weather.R, so we are sure it is a serial code. (What is a core??)
If we didn't write the code ourselves, we have to read the documentation to determine whether the code can be run in parallel or not. The details of parallel programming are beyond the scope of a Quick Start, but in general, if you download a popular application that was written recently, it is probably multithreaded and uses shared memory. Shared memory codes are parallel but can only run on one node. If an application is capable of multithreading, it may use multiple cores whether or not you explicitly told it to do so.
A program that can be run on more than one node has distributed memory parallelization. Look at the documentation to confirm your application can run on multiple nodes, but note that a program will not be using distributed memory unless it calls mpirun before the executable name.
Click here for more information on finding out whether or not your code is parallel.
Remember, the first rule of the AUP is "play nice with others". If you know your job is multithreaded or takes a lot of memory, make sure the job is confined to a single node that is not being shared with others. This can be done by specifying hosts=1 and using the exclusive option:
#BSUB -R span[hosts=1] #BSUB -xWhen using a queue that does not allow for the exclusive option, be specific in your request such that you are filling all the cores on the node. For example, this requests 8 cores (-n 8), that all 8 cores are placed on one node (ptile = 8), and that the node should have 8 cores (a node with two quad core (qc) processors).
#BSUB -n 8 #BSUB -R span[ptile=8] #BSUB -R select[qc]Remember that the AUP also specifies that a job should make efficient use of resources. While using the exclusive option will ensure that you are not affecting others, you still might be over or undersubscribing a node. To learn more about how to examine this, please continue with this exercise and consult HPC staff if you have further questions.
These examples use a simple FORTRAN/MPI/OpenMP code. You do not have to know any of those languages. Launching the LSF script will load the modules, compile the code, and run the code. We are using this code because it demonstrates some important aspects of parallel jobs.
Both codes are simple "Hello World" examples. The shared memory code only uses threads, and it can only run on one node. Each thread will say "hello", and it will print out the name of the node it is running on. Recall from an earlier example how to print out the name of the node:
echo $HOSTNAME
The code simply does this echo command, and in the shared memory example, each task will be on the same node. In the MPI version, the code launches tasks, and each task spawns threads. Each thread will print out which task it was spawned from and also which node it is running on. In this case, the "hello" should come from more than one node...unless something is wrong!
In addition to saying hello, each code does a simple calculation in a very long loop. This is just so that the program doesn't exit immediately. It runs for about 30 seconds.
cp -r /usr/local/apps/samples/guide/parallel . cd parallel
Look at the script submit_shared.sh. The script requests 5 minutes with all cores on a single node (span[hosts=1]) with the exclusive use of a node with at least 8 cores. The environment variable OMP_NUM_THREADS is what controls the threading behavior. Submit the job:
bsub < submit_shared.shThe output will look something like this:
Hello from thread 4 on host n2e6-6 Hello from thread 1 on host n2e6-6 Hello from thread 3 on host n2e6-6 Hello from thread 2 on host n2e6-6Notice that even though we requested 8 cores from LSF, the code only used 4. LSF only reserves the cores. It is up to the programmer (or the user) to dictate how many cores are actually used.
bsub -Is -n 1 -x -W 10 bash
.
echo $HOSTNAME
. Check how many cores are on the node with lscpu
.
export OMP_NUM_THREADS=4 ./hello_shared & htopThe command htop should show that there are 4 threads running. htop is dynamic, and sometimes it isn't clear how many threads are running at once. For a single point in time view (-n 1) that includes not just the executable but also the threads (-H), use top:
[Use Control-C to exit htop] top -n 1 -H
You should see something like:
P PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4 18084 lllowe 30 10 222192 864 696 R 99.9 0.0 0:10.53 hello_shared 3 18085 lllowe 30 10 222192 864 696 R 99.9 0.0 0:10.51 hello_shared 5 18086 lllowe 30 10 222192 864 696 R 99.9 0.0 0:10.53 hello_shared 1 18087 lllowe 30 10 222192 864 696 R 99.9 0.0 0:10.53 hello_shared 7 18101 lllowe 30 10 162532 2760 1544 R 12.5 0.0 0:00.02 top
Exit the interactive session with exit
.
You should still be in the parallel directory. If not, go back to it:
cd /share/$GROUP/$USER/guide/parallel
bsub < submit_mpi.sh
.
bjobs -l
to check how LSF reserved the nodes. There should be 3 nodes reserved with 2 tasks each:
Tue Feb 4 11:10:35: Started on 6 Hosts/Processors <2*n3l4-12> <2*n3l4-11> <2*n 3l4-14>, Execution Home , Execution CWD ;The LSF output should contain something like:
Hello from thread 4 from MPI Task 1 on host n3g1-7 Hello from thread 1 from MPI Task 2 on host n3g1-7 Hello from thread 2 from MPI Task 5 on host n3g1-11 Hello from thread 1 from MPI Task 1 on host n3g1-7 Hello from thread 1 from MPI Task 6 on host n3g1-11 Hello from thread 3 from MPI Task 2 on host n3g1-7 Hello from thread 4 from MPI Task 5 on host n3g1-11 Hello from thread 2 from MPI Task 6 on host n3g1-11 Hello from thread 1 from MPI Task 5 on host n3g1-11 Hello from thread 4 from MPI Task 3 on host n3g1-12 Hello from thread 4 from MPI Task 4 on host n3g1-12 Hello from thread 4 from MPI Task 6 on host n3g1-11 Hello from thread 3 from MPI Task 6 on host n3g1-11 Hello from thread 3 from MPI Task 5 on host n3g1-11 Hello from thread 3 from MPI Task 1 on host n3g1-7 Hello from thread 3 from MPI Task 4 on host n3g1-12 Hello from thread 2 from MPI Task 2 on host n3g1-7 Hello from thread 4 from MPI Task 2 on host n3g1-7 Hello from thread 2 from MPI Task 1 on host n3g1-7 Hello from thread 1 from MPI Task 3 on host n3g1-12 Hello from thread 2 from MPI Task 4 on host n3g1-12 Hello from thread 2 from MPI Task 3 on host n3g1-12 Hello from thread 1 from MPI Task 4 on host n3g1-12 Hello from thread 3 from MPI Task 3 on host n3g1-12
Note: We are controlling the number of threads with OMP_NUM_THREADS. Your application probably does not use this variable (unless you wrote it with OpenMP or are using threaded libraries). To control the threading behavior of your code, you must consult the documentation.
Here is some advice on examining/testing/controlling the threading behavior of an application.