module avail
.
If you are using a custom version of R via the conda package manager, please first read this section before running R.
To set the environment with the default module, do:
module load R
Do not use R on a login node for anything other than compiling new R packages or other non-computational tasks. To use R interactively for test and debug, use an interactive session on a compute node. To confirm that the session is not on a login node, type hostname
.
Here is an example batch script for a serial job. For more information on submitting batch scripts, see the documentation on running jobs.
To run R in batch mode using an R script called my_program.R, create a text file called submit.sh containing:
#!/bin/bash #BSUB -W 20 #BSUB -n 1 ##BSUB -R "rusage[mem=??GB]" #Specify maximum memory required ##BSUB -x #Use exclusive only if necessary #BSUB -o out.%J #BSUB -e err.%J module load R Rscript my_program.R
The job can be submitted as
bsub < submit.sh
The script submit.sh requests one core for 20 minutes, and the syntax for specifying memory requirements is included for reference but commented out. Check the documentation on LSF options to customize the batch script, and see the documentation on requesting memory resources if the job is expected to be memory intensive.
For an example on submitting multiple R jobs, see the following sample R script for multiple job submissions. The script defines various years and models, and then it uses bsub and Rscript to submit a separate job for each scenario.
Using R in shared memory means that the program only functions on one node; there is no communication across nodes. R programs that do not use MPI have shared memory parallelism only. For example, the following requests the exclusive use of a node with at least 8 cores, and using span[hosts=1] ensures that all cores must be on the same node.
#BSUB -n 8 #BSUB -R span[hosts=1] ##BSUB -x #Uncomment if program will spawn additional threadsMany R libraries and functions that are parallel automatically spawn as many threads as exist on a node. A user should either modify the R code to specifically limit the number of cores used, or request exclusive use of the node.
module load R module load openmpi-gcc/openmpi1.8.4-gcc4.8.2Rmpi spawns its own MPI tasks, so it is necessary to launch the R script with a single task. For example, to use 64 tasks in an Rmpi job, the following would be used:
#BSUB -n 64 #.... module load R module load openmpi-gcc/openmpi1.8.4-gcc4.8.2 mpirun -n 1 Rscript.R
Please see and run the sample Rmpi code and batch submission script, Parallel Hello World, before using a script with Rmpi.
An interactive session may be used for short debugging or in order to prepare a production batch script. Interactive sessions must be kept to a minimum and only used when necessary. Nodes left idle or underutilized by long running interactive sessions may be terminated.
The following requests an interactive session with 1 core for 10 minutes. The -x means a request for the entire node (exclusive). Memory intensive R jobs (e.g. manipulating GeoTIFFs or rasters) should request the entire node regardless of cores needed. After the interactive session begins, open R.
bsub -Is -n 1 -x -W 10 bash module load R RTo exit R, type
quit()
. To exit the interactive session on the compute node, type exit
.
Some R packages have complex dependencies on external libraries with varying versions and options required. When using install.packages, a user may have to install those dependencies. In this case, using a Conda environment is preferable to install.packages. See the instructions for installing software with Conda for general details about the procedure, which includes this YAML file for installing a set of R packages. Note that the example rlibs.yml is just an example; the packages included are not necessary for every environment.
Create a YAML file that includes every library call contained in a set of R scripts. Most packages have the naming convention r-packagename. Do a search to confirm whether a Conda package exists for the R library, and if so, to check the name and preferred channel. If an R library does not exist as a Conda package, then install.packages may be used in the general fashion using R while the Conda environment containing the other Conda-installed packages is active.
For bioinformatics, Bioconductor is a great tool. Install the newest base R in a new conda environment, install the BiocManager, and use install.packages within that environment. Please see their documentation for more details and the latest version.
When installing R libraries via Conda, the system R is not used, rather Conda installs a different version of R based on the compatibility of the included libraries, therefore do not module load R
when running from a Conda based R environment. Instead, use conda activate as described in the Conda documentation. Additionally, when using R in a Conda environment, remove hard links in the R scripts to libraries installed with the system R, and remove the file ~/.Renviron.
Most packages in R function with shared memory only, and these jobs must be submitted with #BSUB -R span[hosts=1]
. See the video on Parallel Jobs for more information about shared vs distributed memory.
The Rmpi package allows users to parallelize their scripts with MPI. MPI applications installed with Conda will not function properly unless they are installed with the system MPI. See the following for instructions on installing Rmpi.
Note that Package installations must be performed on a login node, as compute nodes cannot connect to the internet. Documentation on installing packages offline may be available from the software publisher or in the official R documentation.
When using R to install a package, by default, R will prompt the user for permission to create a local directory for installing the additional libraries. After the initial directory creation (installation of the first package), R will use this as the default library location unless specified otherwise. Let R create the local library directory unless you have reason not to.
If you have reason to define the R library directory location something different than the R default, do the following:
Create the directory for storing the packages, e.g. mkdir ~/libs/R_libs
. Then the user-defined default location for finding and installing new packages may be specified by creating a .Renviron file in the user's home directory that defines R_LIBS to be the desired location, e.g. create a file called .Renviron that contains the text R_LIBS=~/libs/R_libs
.
When using multiple versions of R or multiple versions of libraries and packages, a user may create multiple libraries and specify the library locations when installing and when loading the libraries. Libraries installed under one version or compilation of R may not work with another version. To install to a library location different from the one defined in .Renviron, do
install.packages("packagename",lib="~/libs/R_libs_v2")
. To load a library in a different location, do library("packagename",lib.loc="~/libs/R_libs_v2")
.
If the required local R packages take more space than available in a home directory, follow these suggestions.
bsub < Rbsub.sh
. Note that --vanilla
is only used to ensure the example works for any user. In general, do not use --vanilla
, or R will not recognize local settings and user installed packages.
Last modified: September 30 2024 13:58:42.