R

Loading R

module avail R

module load R

Note:

not

module load R

Conda section

Running R

Do not use R on a login node for anything other than installing new R packages or other non-computational tasks. To use R interactively for test and debug, use an interactive session on a compute node. To confirm that the session is not on a login node, type hostname.

Batch R

this note on parallel jobs

Serial R

Here is an example batch script for a serial job. For more information on submitting batch scripts, see the documentation on running jobs.

To run R in batch mode using an R script called my_program.R, create a text file called submit.sh containing:

#!/bin/bash
#SBATCH --time=00:20:00
#SBATCH --ntasks=1
##SBATCH --mem=??G                #Specify maximum memory required
##SBATCH --exclusive               #Use exclusive only if necessary
#SBATCH --output=out.%j
#SBATCH --error=err.%j
module load R
Rscript my_program.R

The job can be submitted as

sbatch submit.sh

The script submit.sh requests one core for 20 minutes, and the syntax for specifying memory requirements is included for reference but commented out. Check the documentation on Slurm options to customize the batch script, and see the documentation on requesting memory resources if the job is expected to be memory intensive.

For an example on submitting multiple R jobs, see the following sample R script for multiple job submissions. The script defines various years and models, and then it uses sbatch and Rscript to submit a separate job for each scenario.

Parallel R - shared memory

Using R in shared memory means that the program only functions on one node; there is no communication across nodes. R programs that do not use MPI have shared memory parallelism only. For example, the following requests the exclusive use of a node with at least 8 cores, and using --nodes=1 ensures that all cores must be on the same node.

#SBATCH --ntasks=8
#SBATCH --nodes=1
##SBATCH --exclusive         #Uncomment if program will spawn additional threads

Parallel R - distributed memory

Rmpi

module load R
module load openmpi-gcc/openmpi1.8.4-gcc4.8.2

#SBATCH --ntasks=64
#....
module load R
module load openmpi-gcc/openmpi1.8.4-gcc4.8.2
srun -n 1 Rscript.R

Please see and run the sample Rmpi code and batch submission script, Parallel Hello World, before using a script with Rmpi.

Interactive R

An interactive session may be used for short debugging or in order to prepare a production batch script. Interactive sessions must be kept to a minimum and only used when necessary. Nodes left idle or underutilized by long running interactive sessions may be terminated.

The following requests an interactive session with 1 core for 10 minutes. The --exclusive means a request for the entire node (exclusive). Memory intensive R jobs (e.g. manipulating GeoTIFFs or rasters) should request the entire node regardless of cores needed. After the interactive session begins, open R.

salloc --ntasks=1 --exclusive --time=00:10:00
module load R
R

quit()

exit

Installing R Packages

R packages should be installed in /usr/local/usrapps/$GROUP/$USER/libs/R. Home directories are too small for most R package collections. If your group does not yet have a /usr/local/usrapps/$GROUP directory, request one here.

All package installations must be performed on a login node, as compute nodes cannot connect to the internet.

Setting up the R library path

Create the directory for your R packages:

mkdir -p /usr/local/usrapps/$GROUP/$USER/libs/R

Create a file called ~/.Renviron containing:

R_LIBS=/usr/local/usrapps/$GROUP/$USER/libs/R

To verify that R sees the library path, open R and type:

> .libPaths()

The first path listed should be your /usr/local/usrapps directory.

Installing with Conda (recommended for complex dependencies)

Conda is the preferred method of installation for R packages that need external dependencies and newer compilers. Some R packages have complex dependencies on external libraries with varying versions and options required. When using install.packages, a user may have to install those dependencies manually. In this case, using a Conda environment is preferable.

See the instructions for installing software with Conda for general details about the procedure, which includes this YAML file for installing a set of R packages. Note that the example rlibs.yml is just an example; the packages included are not necessary for every environment.

Create a YAML file that includes every library call contained in a set of R scripts. Most packages have the naming convention r-packagename. Do a search to confirm whether a Conda package exists for the R library, and if so, to check the name and preferred channel. If an R library does not exist as a Conda package, then install.packages may be used in the general fashion using R while the Conda environment containing the other Conda-installed packages is active.

When installing R libraries via Conda, the system R is not used; Conda installs a different version of R based on the compatibility of the included libraries. Therefore do not module load R when running from a Conda-based R environment. Instead, use conda activate as described in the Conda documentation.

When using R in a Conda environment, remove any ~/.Renviron file, as Conda manages its own R and library paths. Remove hard links in R scripts to libraries installed with the system R.

Installing Rmpi within a Conda environment

Most packages in R function with shared memory only, and these jobs must be submitted with #SBATCH --nodes=1. See the video on Parallel Jobs for more information about shared vs distributed memory.

The Rmpi package allows users to parallelize their scripts with MPI. MPI applications installed with Conda will not function properly unless they are installed with the system MPI. See the following for instructions on installing Rmpi.

Installing with `install.packages`

After setting up the R library path, install.packages() works as usual. Packages will install to the first path listed in .libPaths(), which should be your /usr/local/usrapps directory.

Package installations must be performed on a login node, as compute nodes cannot connect to the internet.

When using multiple versions of R or multiple versions of libraries and packages, a user may create multiple library directories and specify the library locations when installing and when loading the libraries. Libraries installed under one version or compilation of R may not work with another version. To install to a specific library location, do install.packages("packagename", lib="/usr/local/usrapps/$GROUP/$USER/libs/R_v2"). To load a library from a specific location, do library("packagename", lib.loc="/usr/local/usrapps/$GROUP/$USER/libs/R_v2").

Setting the environment for user installed packages

When using R libraries, the run environment should be the same as the compile environment. For example, if a certain module was loaded or the export command was used when installing a package, that same module or environment variable must be used when running the package.

Bioconductor

For bioinformatics, Bioconductor is a great tool. Install the newest base R in a new Conda environment, install the BiocManager, and use install.packages within that environment. BiocManager can also be used with the system R after setting up the R library path. Please see the Bioconductor documentation for more details and the latest version.

Example R batch script for Rmpi

Parallel Hello World