Conda

Conda is an open source package management system.

External Links
Conda website
Conda: User Guide

Contents

Loading and initializing Conda
Installing and activating a Conda environment
Running a Conda installed application
Warning: multithreading and MPI applications
Tips and Troubleshooting
What is a Conda environment anyway???

Loading and Initializing Conda

Before using Conda, the following two steps are required. Conda must be initialized, and a .condarc must be created. The default shell for the new Hazel cluster is bash (however the tcsh shell is also available, type "tcsh").

1) Initialize Conda environment:

This step is necessary only once for an HPC user, unless the initialization settings are removed.

To load the system installed Conda, load the module and use init to add it to the path. Normally using a login file to automatically set the environment is strongly discouraged, but in the case of Conda, many features cannot be used without setting this initialization file. Log out and then back in again after using conda init.

module load conda
conda init bash
[ignore the warnings, log out then back in again]

Optional - remove old environments:

For users who have already been using different Conda environments and would like to begin installing with the new recommended procedures, clean out the remnants of old Conda environments by doing the following.

cd ~
more .bashrc
more .tcshrc

# >>> conda initialize >>>
(stuff)
# >>> conda initialize >>>

2) Create a .condarc file:
This step is mandatory: Conda will fill the quota of the home directory if pkgs_dirs is not set in this file.

By default, Conda stores packages in the /home directory. The /home directory is too small for that, and the packages are only needed temporarily. They should not be saved, taking up space in permanent directories. To change the default location to /share, use a text editor to create a file called .condarc. The path to that file should be /home/$USER/.condarc and it should contain the path to the alternative location, e.g.:

pkgs_dirs: 
 - /share/$GROUP/$USER/conda/pkgs

In addition, many packages require adding a 'channel'. Common channels may be added before creating environments by editing the .condarc.

To add channels, add them to ~/.condarc. For example, the bioconda and conda-forge channels may be added by adding the following lines to ~/.condarc:
```
channels:
  - bioconda
  - conda-forge
```

The following displays a sample .condarc file:

[unityid@login01 ~]$ cd
[unityid@login01 ~]$ more .condarc
pkgs_dirs:
  - /share/group_name/unityid/conda/pkgs
channels:
  - bioconda
  - conda-forge

Installing and activating a Conda environment

Before installing any software, including a Conda environment, request a space for user maintained software to be used by all members of a Project. The path for that space is generally /usr/local/usrapps/groupname.

The following is the general idea of how to use Conda. Please use a YAML file to avoid package conflicts. See below for details.

To install Conda environments, specify a prefix, which will be the path to where the environment will be installed. Choose a descriptive name for the environment - Conda will create the directory, the directory should not already exist. For example, to create a Conda environment called env_ABC containing the packages AAA, BBB, and CCC, and install it in the directory /usr/local/usrapps/[groupname][username], do :

conda create --prefix /usr/local/usrapps/$GROUP/$USER/env_ABC AAA BBB CCC

To activate the environment, do:

conda activate /usr/local/usrapps/$GROUP/$USER/env_ABC

Once in a Conda environment, a user can install additional packages using either conda install or pip install (after doing conda install pip); however, this is not recommended, as it is harder when doing so to maintain an environment where all software is compatible.

Best practice is to create a YAML file with all of the desired Conda packages. Conda will 'solve' the environment, that is, it will find a configuration where all desired packages are the correct version numbers to work together, assuming such a configuration exists. If a user needs a version of one software that is not compatible with another, then they would create two different Conda environments. When updating, create a new environment and do not delete the old version without thoroughly testing the new one.

To create a Conda environment from a YAML file called ABC.yml, do

conda env create --prefix /usr/local/usrapps/$GROUP/$USER/env_ABC -f ABC.yml

datascience.yml - Contains many common data science programs
sklearn.yml - Machine learning with scikit-learn
biotools.yml - Contains applications for a bioinformatics workflow
ncdfutil.yml - Used in sponsored software group ncdfutil, contains many NetCDF Utilities
rlibs.yml - Used to create an environment with custom R libraries

When Conda creates an environment, it finds a configuration such that all of the packages/dependencies are compatible. If a great many packages are added to a YAML file, it might be impossible for Conda to resolve the necessary environment. In that case, multiple Conda environments will need to be created.

conda deactivate

Running a Conda installed application

Activating a Conda environment sets the compute environment, and is similar to loading a module. Here is a sample batch script that uses an application called mycode that was installed via a Conda environment:

#!/bin/bash
#BSUB -n 1
#BSUB -W 120
#BSUB -J mycode
#BSUB -o stdout.%J
#BSUB -e stderr.%J
source ~/.bashrc
conda activate /usr/local/usrapps/groupname/username/env_mycode
mycode
conda deactivate

NOTE:

Warning: multithreading and MPI applications

Multi-threading: Many of the applications available in Conda environments are automatically run in parallel, that is, they may auto-detect the number of cores on the nodes and spawn the same number of tasks. See the following for more information on testing, and contact HPC Staff for further assistance.
Using MPI: Conda is package management system that installs applications and all of their dependencies. That means that if an installed package requires MPI, Conda will install its own MPI. User installed MPI will not work properly with LSF, therefore a user must use the system MPI. See the following documentation for of using system MPI to install R or Python packages requiring MPI.
Tips and Troubleshooting
To list the available packages contained in an activated Conda environment, do conda list.
Alternate versions of Python and various packages may be specified by following the package name with =version.number, e.g. matplotlib=3.1.
All installations must be done from a login node. Packages cannot be downloaded from a compute node, and neither a compute node nor an HPC-VCL node can write to /usr/local/usrapps.
Use a YAML file! If you create a Conda environment and then attempt to add something to it, Conda may not be able to reconcile the existing applications and libraries with the new application. This is common in R or Python, when newer versions or libraries only work with newer releases. It also happens when attempting to install an older package into an existing environment. For example, older packages may need Python 2, and therefore cannot be used with a Python 3 environment. The message will say "solving environment" and it will continue for quite some time before eventually giving up. If this happens, create a new environment that includes the new application. (Keep the old environment!)
Create multiple environments rather than adding to them. Using conda install on an existing environment may break that environment, resulting in your scripts suddenly not working anymore.
Check your syntax: don't forget either the 'env' or the '-f' in the conda env create --prefix /path/to/env_ABC -f ABC.yml.
PackagesNotFoundError: If you get this error, first check your syntax. Next, make sure the package exits, and if so, which channel or channels it is available from. Some Python packages are not available through Conda, and some Conda packages are only available through specific channels. You can find this information by searching for the package name on the Anaconda.org search page.
The instructions use 'pip install'. Some packages are only available through GitHub or some other website or collaborator. In this case, go to the website for the package and carefully read the instructions. Create a YAML file with all of the dependencies listed in the instructions, including version numbers when indicated, and also include any additional applications or Python packages that you will use while doing your workflow. After creating and activating the Conda environment from the YAML file, follow the remaining instructions for the "pip install". The packages installed by pip will then be available when you activate that Conda environment.
Python notebooks on the HPC-VCL: If the application includes examples using IPython or Jupyter notebook, they can be tested using the HPC-VCL. See these instructions for requesting access to and using the HPC-VCL.
Python notebooks on a compute node: Jupyter notebook is a web-based coding environment that gives you the ability to run code in “chunks” rather than executing a whole script at once. Although Jupyter notebook runs in your browser, the underlying computation is executed on a remote node, which in our case is the Hazel cluster. This means that there are a few steps that need to be done to set it up.

What is a Conda environment anyway???

Most of the time, Conda installs applications by downloading precompiled binaries from package repositories. Sometimes it downloads source code, and then compiles it on Henry2, in which case other modules or packages may need to be installed or linked - for example, you may need to load the CUDA modules for ML/AI packages.
The transient files (tar balls, source code, etc.) are downloaded to a temporary directory, which was defined in the .condarc in 'pkgs_dirs'. If that is not set, the default is to download to the home directory. The home directory is too small for this, and as these packages are intermediate products, they should not be saved to a directory with limited space or that is backed up. Don't waste permanent storage on it - put it in /share, as shown by the example .condarc file.
After downloading, the applications will be installed to a location defined by '--prefix', which generally should be set to the project's space in /usrapps.
Conda creates the equivalent of a module, and 'conda activate' does the equivalent of 'module load', i.e., it sets variables such as PATH and LD_LIBRARY_PATH to point to the proper locations in '--prefix'. (See more about environment variables here.)

Last modified: June 06 2025 00:26:34.