Find out properties of queues, compute nodes, GPUs.
Find more info about jobs by using bjobs -l
. Also note that bjobs -lp -p3
gives even more detail. This will include a list of reasons the job is pending, and it may include an estimate of when the job will start, e.g., Job will start no sooner than indicated time stamp. It is possible that the chosen resources are currently being used, but it is also possible that the specified resources do not exist on the system. For example, LSF will not give an error message upon requesting a 64 core node with 500 GB of memory; it will simply wait until such a node is installed, leaving the job in a forever pending state.
Job priority is determined by several factors including fair share priority, queue priority, and time of submission.
Search for jobs being run by all users and filter for those in that particular queue. For example, to check how many jobs are running in the gpu queue, use
bjobs -u all | grep gpu
You can find which hosts have a GPU by using
lshosts | grep gpu
bqueues will show the number of total jobs in the queue (NJOBS), how many are actually running (RUN), and how many are pending (PEND). MAX is the maximum number of cores available. For some queues, like gpu, the MAX is not shown.
bqueues -l gpu
The resources for each model is given by lshosts. There is currently one P100 node:
[unityID@login04 ~]$ lshosts | grep p100 n3h39 LINUXRH E52650v4 1.0 24 262050M 32767M Yes (gpu twc sse sse2 ssse3 sse4_1 sse4_2 avx avx2 p100)Here are the some commands to find other resources:
[unityID@login04 ~]$ lshosts | grep m2070 [unityID@login04 ~]$ lshosts | grep avx2 [unityID@login04 ~]$ lshosts | grep qc
See LSF Resources for more information on specific resources.
back to topEXEC_HOST is the host group the job is running on. To find more about the individual hosts available in that group, use bmgroup
[unityID@login04 ~]$ bmgroup bc2e4 GROUP_NAME HOSTS bc2e4 n2e4-1 n2e4-2 n2e4-3 n2e4-4 n2e4-5 n2e4-6 n2e4-7 n2e4-8 n2e4-9 n2e4-10 n2e4-11 n2e4-12 n2e4-13 n2e4-14To find out more about the specific hosts, e.g., n2e4-5, use
[unityID@login04 ~]$ lshosts | grep n2e4-3 n2e4-3 LINUXRH E5405 1.0 8 16383M 32767M Yes (qc sse sse2 ssse3 sse4_1)This shows that node n2e4-3 has processor model E5405, 8 cores (2 quad core processors), 16 G memory, does not support AVX instructions and does not have InfiniBand(ib).
If a node from the same group is needed, e.g., same blade or same rack on single_chassis, use the -m option.
#BSUB -m "bmgroup"
Example:
#BSUB -m "blade2a1"
If the exact same piece of hardware is needed, meaning the same actual node(host), use the -m option.
#BSUB -m "hostname"
Example:
#BSUB -m "n2e4-3"Note that this may give very long queue wait times. Additionally, it should be verified that the node is contained in the resource pool or queue that it is being submitted to.
See LSF Resources for more information on specific resources.
back to topIf the resource type is not specified, the queuing system will assign a job wherever it might fit. This will result in the job being executed on different types of hardware - new or old, more or less cores, etc. For consistent run times, specify a particular resource. Run times may also show marked differences when shared with other jobs. For scaling tests, use -x to avoid contention with other jobs.
See LSF Resources for more information on specific resources.
back to topSuppose a partner queue is monkey. Do bqueues -l monkey:
[unityID@login04 ~]$ bqueues -l monkey QUEUE: monkey -- partner queue PARAMETERS/STATISTICS PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV PJOBS 100 10 Open:Active 264 - - - 0 0 0 0 0 0 0 HOSTS: monkey_ib+10 interconnect_ib+8 blade2h2+4This shows that there are 264 cores available on the partner queue monkey. The queue has access to the monkey_ib group and also the interconnect_ib group. To find the hardware,
[unityID@login04 ~]$ bmgroup monkey_ib GROUP_NAME HOSTS monkey_ib n2g3-2 n2g3-3 n2g3-4 n2g3-5 n2g3-6 n2g3-7 n2g3-8 n2g3-9 n2g3-10 n2g3-11 n2g3-1To get more specific hardware info,
[unityID@login04 ~]$ lshosts n2g3-2 HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES n2g3-2 LINUXRH E52650v4 1.0 24 130237M 32767M Yes (twc sse sse2 ssse3 sse4_1 sse4_2 avx avx2 ib)This shows that the monkey_ib group consists of eleven 24 core nodes, supporting up to AVX2 instruction, and has InfiniBand. back to top
If the queue is not specified in the job script, LSF will attempt to choose the most appropriate queue. To find the queues that a user has access to, use bqueues -u
followed by the login name (Unity ID).
bqueues -u unityID
Use bqueues -l
:
[unityID@login04 ~]$ bqueues -l debug MAXIMUM LIMITS: RUNLIMIT 10.0 min of servlsf [unityID@login04 ~]$ bqueues -l single_chassis MAXIMUM LIMITS: RUNLIMIT 5760.0 min of servlsfAt the date of this publication, the limit for the debug queue was 10 minutes, and the limit for the single_chassis queue was 4 days. Queue limits are subject to change without notice.
LSF will report an error if I ask for more processors or wall time than allocated for a queue, but if I specify too much memory, the jobs is submitted but never runs.
Most default queues contain nodes of all memory sizes. Some non-default queues may be more limited, and some partner queues may contain larger memory nodes. See this documentation on memory resources for the most current information.
Instructions on finding specs is also listed in the FAQ about finding the types of hardware available in a queue.
back to top