Nodes on the cluster have different memory sizes. See this segment from the Parallel Jobs video on finding the specs on the cluster.
The memory of the majority of different nodes on the cluster, listed in GB, are
64 128 192 256 512* 1024***Nodes with this higher amount of memory are limited outside of partner queues.
When requesting memory based on increments of RAM on available nodes on the cluster, request memory somewhat lower than the full increment since OS processes also take memory. For example, if a 128GB node (or higher) is desired, then request 120GB memory.
Open the standard out file, e.g. stdout.[JOBID], and scroll to the bottom, which should contain, e.g.:
Resource usage summary: CPU time : 209542.33 sec. Max Memory : 15725.28 MB (*) Average Memory : 10982.08 MB(*) Max memory is how you will determine how much memory to request in future job scripts.
If the jobs' LSF output files have the naming convention stdout.%JOBID, then do
grep "Max Memory" stdout*
To find information about all your running jobs, do:
bjobs -r -X -o "jobid queue cpu_used run_time avg_mem max_mem slots delimiter=','"This will return a CSV formatted list of your jobs showing the job ID, queue, total CPU time, elapsed wall clock time, average memory utilized, maximum memory utilized, and the number of cores reserved.
Check the max memory for your jobs, and make sure it does not exceed the amount requested or an amount that is a significant portion of the node assigned. You can check how much memory your assigned node has by using lshosts
. For example, if your job is running on node n3m3-1, you can find the memory by doing:
lshosts | grep n3m3-1To find the nodes assigned to your job, do:
bjobs -l [JOBID]Also check if the maximum memory of the job steadily increases with time. If it does, look at the code or documentation and decide whether that is expected behavior. If not, it might be a problem with the application, such as a memory leak.
You may be able to estimate how much memory will be required for your job based on the following:
Once the memory requirements have been estimated, memory requirements can be specified in an LSF script by using rusage. Please check the link to using rusage for more information.
Usage requests are per host, and the default unit is GB; to request that your job be assigned to a node or a set of nodes each having at least 64GB of RAM, do#BSUB -R "rusage[mem=64]"or
#BSUB -R "rusage[mem=64GB]"When requesting memory based on increments of RAM on available nodes on the cluster, request memory somewhat lower than the full increment since OS processes also take memory. For example, if a 128GB node (or higher) is desired, then request 120GB memory.
As explained in the video on the Acceptable Use Policy, your job must not interfere with other users, and it should make efficient use of resources.
If your job will take most of the memory resources of a node, use -x. It is also appropriate to use -x when doing the tests to measure the amount of memory needed.
For production runs (e.g., submitting many simultaneous jobs), it is inappropriate to request -x when your job does not need to do so.
On the other hand, for production runs, you may want to ensure that other users are not placed on your node. Using -x may be appropriate in this case, but you should contact staff to assist in creating LSF batch scripts that ensure you are using -x on the subset of resources appropriate to your job. Staff may also be able to assist in bundling jobs such that only your jobs occupy the nodes you are assigned to.