Submitting multiple batch scripts to LSF
Many workflows involve submitting multiple compute jobs with slightly different parameters. Users can create a more efficient and reproducible workflow by using LSF job arrays or shell scripts to automate submission to LSF. Some basic scripts are provided as examples.
Caution!
- Infinite loops that crash LSF
- Too many jobs creating enough simultaneous I/O to crash the file system
- Taking all available software licenses, making them unavailable to other users
In those cases, a user will be REQUIRED to kill ALL jobs.
- To kill all jobs, use
bkill 0
Before using an automated submission script:
- Test the logic of codes and scripts by adding echo statements and running with the bsub command commented out.
- Always do an initial test that submits only a single job. Next, test with only a few jobs for a short amount of time, and monitor the usage and output.
- When in doubt, contact HPC staff.
Job arrays for multiple job submissions
LSF job arrays allow a user to submit multiple jobs to LSF as defined by an array of integers. If the workflow allows for inputs, outputs, and parameters to be fully characterized by a single number, job arrays are the most efficient way of submitting multiple jobs.
In the sample batch script below, LSF will spawn 25 serial jobs, and each will execute the linesource ./echo_hostname.sh $LSB_JOBINDEX. Here, $LSB_JOBINDEX is the job array index (an integer from 1 to 25) for each job.
#!/bin/bash #BSUB -J My_array[1-25] #job name AND job array #BSUB -n 1 #number of cores #BSUB -W 00:10 #walltime limit: hh:mm #BSUB -o Output_%J_%I.out #output - %J is the job-id %I is the job-array index #BSUB -e Error_%J_%I.err #error - %J is the job-id %I is the job-array index source ./echo_hostname.sh $LSB_JOBINDEX
Job array - serial example
The script job_array_serial.sh submits 25 jobs that run the program echo_hostname.sh, which echoes the hostname of the node that it is running on and which job of the job array it is. To use, type bsub < job_array_serial.sh. The scripts and the resulting output can be viewed here:
job_array_serial.sh
echo_hostname.sh
output
To avoid copy/paste errors when using, please copy these from the apps directory:
/usr/local/apps/examples/scripts/job_arrays
Job array - parallel example
The script job_array.sh is to demonstrate that multiple parallel jobs may be submitted in the same manner as for multiple serial jobs. The sample code hello_omp.F90 is a hybrid MPI-OpenMP example which echoes the hostname of the node that each thread is running on. To use, first compile the sample code, and then type bsub < job_array.sh. The job_array.sh contains instructions for compiling the sample code. The scripts and the resulting output can be viewed here:
job_array.sh
hello_omp.F90
output
To avoid copy/paste errors when using, please copy these from the apps directory:
/usr/local/apps/examples/scripts/job_arrays
Sample automation scripts using loops to submit multiple jobs to LSF
If automating job submissions requires something more complex than is available via the use of job arrays as described above, job submission may be automated with batch scripts.
Basic script for multiple job submissions
The script multiple_jobs.sh uses bsub to run the program run.sh, which echoes the hostname of the node that it is running on. This can also be used as a test as to whether an LSF batch script will distribute jobs to the intended hosts. The scripts and the resulting output can be viewed here:
multiple_jobs.sh
run.sh
output
To avoid copy/paste errors when using, please copy these from the apps directory:
/usr/local/apps/examples/scripts/basic/
R script for multiple job submissions
The script R_loops.sh uses bsub and Rscript to define various years and models and then run an R script codehpc.R for each scenario. The scripts and the resulting output can be viewed here:
R_loops.sh
codehpc.R
output
To avoid copy/paste errors when using, please copy these from the apps directory:
/usr/local/apps/examples/scripts/R_loops/