R

Loading R

module avail

module load R

Running R

Do not use R on a login node for anything other than compiling new R packages or other non-computational tasks. To use R interactively for test and debug, use an interactive session on a compute node. To confirm that the session is not on a login node, type hostname.

Batch R

this note on parallel jobs

Serial R

Here is an example batch script for a serial job. For more information on submitting batch scripts, see the documentation on running jobs.

To run R in batch mode using an R script called my_program.R, create a text file called submit.sh containing:

#!/bin/bash
#BSUB -W 20
#BSUB -n 1
##BSUB -R "rusage[mem=??GB]"   #Specify maximum memory required
##BSUB -x                      #Use exclusive only if necessary 
#BSUB -o out.%J
#BSUB -e err.%J
module load R
Rscript my_program.R

The job can be submitted as

bsub < submit.sh

The script submit.sh requests one core for 20 minutes, and the syntax for specifying memory requirements is included for reference but commented out. Check the documentation on LSF options to customize the batch script, and see the documentation on requesting memory resources if the job is expected to be memory intensive.

For an example on submitting multiple R jobs, see the following sample R script for multiple job submissions. The script defines various years and models, and then it uses bsub and Rscript to submit a separate job for each scenario.

Parallel R - shared memory

Using R in shared memory means that the program only functions on one node; there is no communication across nodes. R programs that do not use MPI have shared memory parallelism only. For example, the following requests the exclusive use of a node with at least 8 cores, and using span[hosts=1] ensures that all cores must be on the same node.

#BSUB -n 8
#BSUB -R span[hosts=1]
##BSUB -x               #Uncomment if program will spawn additional threads

Parallel R - distributed memory

Rmpi

module load R
module load openmpi-gcc/openmpi1.8.4-gcc4.8.2

#BSUB -n 64
#....
module load R
module load openmpi-gcc/openmpi1.8.4-gcc4.8.2
mpirun -n 1 Rscript.R

Please see and run the sample Rmpi code and batch submission script, Parallel Hello World, before using a script with Rmpi.

Interactive R

An interactive session may be used for short debugging or in order to prepare a production batch script. Interactive sessions must be kept to a minimum and only used when necessary. Nodes left idle or underutilized by long running interactive sessions may be terminated.

The following requests an interactive session with 1 core for 10 minutes. The -x means a request for the entire node (exclusive). Memory intensive R jobs (e.g. manipulating GeoTIFFs or rasters) should request the entire node regardless of cores needed. After the interactive session begins, open R.

bsub -Is -n 1 -x -W 10 bash
module load R
R

quit()

exit

Installing R packages with Conda

Conda is the preferred method of installation for R packages that need external dependencies and newer compilers.

Some R packages have complex dependencies on external libraries with varying versions and options required. When using install.packages, a user may have to install those dependencies. In this case, using a Conda environment is preferable to install.packages. See the instructions for installing software with Conda for general details about the procedure, which includes this YAML file for installing a set of R packages. Note that the example rlibs.yml is just an example; the packages included are not necessary for every environment.

Create a YAML file that includes every library call contained in a set of R scripts. Most packages have the naming convention r-packagename. Do a search to confirm whether a Conda package exists for the R library, and if so, to check the name and preferred channel. If an R library does not exist as a Conda package, then install.packages may be used in the general fashion using R while the Conda environment containing the other Conda-installed packages is active.

For bioinformatics, Bioconductor is a great tool. Install the newest base R in a new conda environment, install the BiocManager, and use install.packages within that environment. Please see their documentation for more details and the latest version.

When installing R libraries via Conda, the system R is not used, rather Conda installs a different version of R based on the compatibility of the included libraries, therefore do not module load R when running from a Conda based R environment. Instead, use conda activate as described in the Conda documentation. Additionally, when using R in a Conda environment, remove hard links in the R scripts to libraries installed with the system R, and remove the file ~/.Renviron.

Installing Rmpi within a Conda environment

Most packages in R function with shared memory only, and these jobs must be submitted with #BSUB -R span[hosts=1]. See the video on Parallel Jobs for more information about shared vs distributed memory.

The Rmpi package allows users to parallelize their scripts with MPI. MPI applications installed with Conda will not function properly unless they are installed with the system MPI. See the following for instructions on installing Rmpi.

Installing R packages with `install.packages`

If 'install.packages' with an HPC installed R returns errors related to library/package dependencies, please use Conda. See above.

Note that Package installations must be performed on a login node, as compute nodes cannot connect to the internet. Documentation on installing packages offline may be available from the software publisher or in the official R documentation.

When using R to install a package, by default, R will prompt the user for permission to create a local directory for installing the additional libraries. After the initial directory creation (installation of the first package), R will use this as the default library location unless specified otherwise. Let R create the local library directory unless you have reason not to.

If you have reason to define the R library directory location something different than the R default, do the following:
Create the directory for storing the packages, e.g. mkdir ~/libs/R_libs. Then the user-defined default location for finding and installing new packages may be specified by creating a .Renviron file in the user's home directory that defines R_LIBS to be the desired location, e.g. create a file called .Renviron that contains the text R_LIBS=~/libs/R_libs.

When using multiple versions of R or multiple versions of libraries and packages, a user may create multiple libraries and specify the library locations when installing and when loading the libraries. Libraries installed under one version or compilation of R may not work with another version. To install to a library location different from the one defined in .Renviron, do install.packages("packagename",lib="~/libs/R_libs_v2"). To load a library in a different location, do library("packagename",lib.loc="~/libs/R_libs_v2").

If the required local R packages take more space than available in a home directory, follow these suggestions.

Setting the environment for user installed packages

export

installing

running

Example R batch script for Rmpi

Parallel Hello World