R
R is an open source statistics package.External Links:
CRAN R website
An Introduction to R
Loading R
There are various versions of R on Hazel. To see the available versions, usemodule avail R.
To set the environment with the default module, do:
module load RNote: Do not
module load R when using a Conda-based R environment. See the Conda section for details.
Running R
Do not use R on a login node for anything other than installing new R packages or other non-computational tasks. To use R interactively for test and debug, use an interactive session on a compute node. To confirm that the session is not on a login node, type hostname.
Batch R
Before running an R job, verify whether the R functions used are serial or parallel. See this note on parallel jobs for more information. Do not run a code before determining whether or not it contains parallel functions.Serial R
Here is an example batch script for a serial job. For more information on submitting batch scripts, see the documentation on running jobs.
To run R in batch mode using an R script called my_program.R, create a text file called submit.sh containing:
#!/bin/bash #SBATCH --time=00:20:00 #SBATCH --ntasks=1 ##SBATCH --mem=??G #Specify maximum memory required ##SBATCH --exclusive #Use exclusive only if necessary #SBATCH --output=out.%j #SBATCH --error=err.%j module load R Rscript my_program.R
The job can be submitted as
sbatch submit.sh
The script submit.sh requests one core for 20 minutes, and the syntax for specifying memory requirements is included for reference but commented out. Check the documentation on Slurm options to customize the batch script, and see the documentation on requesting memory resources if the job is expected to be memory intensive.
For an example on submitting multiple R jobs, see the following sample R script for multiple job submissions. The script defines various years and models, and then it uses sbatch and Rscript to submit a separate job for each scenario.
Parallel R - shared memory
Using R in shared memory means that the program only functions on one node; there is no communication across nodes. R programs that do not use MPI have shared memory parallelism only. For example, the following requests the exclusive use of a node with at least 8 cores, and using --nodes=1 ensures that all cores must be on the same node.
#SBATCH --ntasks=8 #SBATCH --nodes=1 ##SBATCH --exclusive #Uncomment if program will spawn additional threadsMany R libraries and functions that are parallel automatically spawn as many threads as exist on a node. A user should either modify the R code to specifically limit the number of cores used, or request exclusive use of the node.
Parallel R - distributed memory
To use Rmpi, load the following modules:module load R module load openmpi-gcc/openmpi1.8.4-gcc4.8.2Rmpi spawns its own MPI tasks, so it is necessary to launch the R script with a single task. For example, to use 64 tasks in an Rmpi job, the following would be used:
#SBATCH --ntasks=64 #.... module load R module load openmpi-gcc/openmpi1.8.4-gcc4.8.2 srun -n 1 Rscript.R
Please see and run the sample Rmpi code and batch submission script, Parallel Hello World, before using a script with Rmpi.
Interactive R
An interactive session may be used for short debugging or in order to prepare a production batch script. Interactive sessions must be kept to a minimum and only used when necessary. Nodes left idle or underutilized by long running interactive sessions may be terminated.
The following requests an interactive session with 1 core for 10 minutes. The --exclusive means a request for the entire node (exclusive). Memory intensive R jobs (e.g. manipulating GeoTIFFs or rasters) should request the entire node regardless of cores needed. After the interactive session begins, open R.
salloc --ntasks=1 --exclusive --time=00:10:00 module load R RTo exit R, type
quit(). To exit the interactive session on the compute node, type exit.
Installing R Packages
R packages should be installed in /usr/local/usrapps/$GROUP/$USER/libs/R. Home directories are too small for most R package collections. If your group does not yet have a /usr/local/usrapps/$GROUP directory, request one here.
All package installations must be performed on a login node, as compute nodes cannot connect to the internet.
Setting up the R library path
Create the directory for your R packages:
mkdir -p /usr/local/usrapps/$GROUP/$USER/libs/R
Create a file called ~/.Renviron containing:
R_LIBS=/usr/local/usrapps/$GROUP/$USER/libs/R
To verify that R sees the library path, open R and type:
> .libPaths()
The first path listed should be your /usr/local/usrapps directory.
Installing with Conda (recommended for complex dependencies)
Conda is the preferred method of installation for R packages that need external dependencies and newer compilers. Some R packages have complex dependencies on external libraries with varying versions and options required. When using install.packages, a user may have to install those dependencies manually. In this case, using a Conda environment is preferable.
See the instructions for installing software with Conda for general details about the procedure, which includes this YAML file for installing a set of R packages. Note that the example rlibs.yml is just an example; the packages included are not necessary for every environment.
Create a YAML file that includes every library call contained in a set of R scripts. Most packages have the naming convention r-packagename. Do a search to confirm whether a Conda package exists for the R library, and if so, to check the name and preferred channel. If an R library does not exist as a Conda package, then install.packages may be used in the general fashion using R while the Conda environment containing the other Conda-installed packages is active.
When installing R libraries via Conda, the system R is not used; Conda installs a different version of R based on the compatibility of the included libraries. Therefore do not module load R when running from a Conda-based R environment. Instead, use conda activate as described in the Conda documentation.
When using R in a Conda environment, remove any ~/.Renviron file, as Conda manages its own R and library paths. Remove hard links in R scripts to libraries installed with the system R.
Installing Rmpi within a Conda environment
Most packages in R function with shared memory only, and these jobs must be submitted with #SBATCH --nodes=1. See the video on Parallel Jobs for more information about shared vs distributed memory.
The Rmpi package allows users to parallelize their scripts with MPI. MPI applications installed with Conda will not function properly unless they are installed with the system MPI. See the following for instructions on installing Rmpi.
Installing with install.packages
After setting up the R library path, install.packages() works as usual. Packages will install to the first path listed in .libPaths(), which should be your /usr/local/usrapps directory.
Package installations must be performed on a login node, as compute nodes cannot connect to the internet.
When using multiple versions of R or multiple versions of libraries and packages, a user may create multiple library directories and specify the library locations when installing and when loading the libraries. Libraries installed under one version or compilation of R may not work with another version. To install to a specific library location, do
install.packages("packagename", lib="/usr/local/usrapps/$GROUP/$USER/libs/R_v2"). To load a library from a specific location, do library("packagename", lib.loc="/usr/local/usrapps/$GROUP/$USER/libs/R_v2").
Setting the environment for user installed packages
When using R libraries, the run environment should be the same as the compile environment. For example, if a certain module was loaded or the export command was used when installing a package, that same module or environment variable must be used when running the package.
Bioconductor
For bioinformatics, Bioconductor is a great tool. Install the newest base R in a new Conda environment, install the BiocManager, and use install.packages within that environment. BiocManager can also be used with the system R after setting up the R library path. Please see the Bioconductor documentation for more details and the latest version.
Example R batch script for Rmpi
Here is a sample R job, having an R script (*.R), a submit script (*.sh), the R output (*.Rout), and the Slurm standard out and standard error files.Parallel Hello World
Uses Rmpi to say hello 500 times. Each of 63 processors (64th is the master processor who doesn't say hello) waits its turn and says hello in a round robin fashion until they finish 500 hellos. Being sequential, it doesn't demonstrate how to speed up existing code; the purpose is to write the host names and core numbers to check if R is spawning workers correctly. If this is not done correctly, all of the workers may end up on one node. Submit usingsbatch Rbsub.sh. Note that --vanilla is only used to ensure the example works for any user. In general, do not use --vanilla, or R will not recognize local settings and user installed packages.
Last modified: March 14 2026 09:24:22.