This is the current best practice in using Python on the HPC. Deprecated Python modules were removed on May 3rd, 2020.

Python

Python is an interpreted, object-oriented, high-level programming language. Python is often used for rapid prototyping; it is optimized for developer productivity rather than execution speed.

External Links:
Python website
Python: Getting Started

Python versions and updates

There is a default Python 2 that is included in the Linux distribution installed on Henry2, and a staff maintained Python 3 that is available as a module. The system Python is updated whenever there are system Linux upgrades, and Conda (which has a default of Python 3) is updated as soon as upgrades become available. Updates to Conda will be posted on the Cluster Status page.

Many Python packages have dependencies, and updating a dependency for one package may break another package. For this reason, the staff maintained Python does not contain any additional Python packages other than the basic install. Virtual environments created based on a Python version may break if the base Conda install is updated.

To avoid 'breaking' a Python environment, users should create Conda environments. A Conda environment should not change if the base Conda is updated.

To use the system Python 2 distribution, no environment changes are needed. To load the current Python 3 environment, use

module load conda 

Running Python

Do not use Python on a login node for anything other than compiling new Python packages or other non-computational tasks. To use Python interactively for test and debug, use an interactive session on a compute node.

Batch Python

Here is an example batch script. For more information on submitting batch scripts, see the documentation on running jobs.

Example:
For a Python script called hello.py, create a batch script submit.sh containing:

#!/bin/bash
#BSUB -W 10
#BSUB -n 1
#BSUB -o out.%J
#BSUB -e err.%J

module load conda   #delete if using system Python 2
python hello.py

Then use

bsub < submit.sh 

The above script will request one core for 10 minutes. Check the documentation on running jobs to customize the batch script.

Python from a Conda environment

Creating the environment

To create an environment with a custom version of Python or a specific set of Python libraries, see the Conda documentation.

Running a script

To run a script with a custom version of Python or Python libraries, create a batch script that activates the Conda environment.

#!/bin/bash
#BSUB -W 10
#BSUB -n 1
#BSUB -o out.%J
#BSUB -e err.%J

conda activate /path/to/my_env
python hello.py
conda deactivate

Interactive test and debug

Interactive Compute Node

An interactive session may be used for short debugging or in order to prepare a production batch script. Interactive sessions must be kept to a minimum and only used when necessary. Nodes left idle or underutilized by long running interactive sessions may be terminated.

The following requests an interactive session with 1 core for 10 minutes and opens a prompt in Python 3.

bsub -Is -n 1 -W 10 bash
hostname   #This should not be a login node!
module load conda
python

Running distributed memory (MPI) Python

Most packages installed with Conda function with shared memory only, and these jobs must be submitted with #BSUB -R span[hosts=1]. See the video on Parallel Jobs for more information about shared vs distributed memory.

The mpi4py package allows users to parallelize their scripts with MPI. Packages installed with Conda will not function properly unless they are installed with the system MPI. See the following for instructions on installing mpi4py.

Hello World with mpi4py

A simple Conda environment can be created to do this test. See the documentation for initializing Conda if it was not yet initialized.

conda create --prefix /share/$GROUP/$USER/env_mpi4py pip psutil 
Then, as per the instructions on installing mpi4py, do
conda activate /share/$GROUP/$USER/env_mpi4py
module load PrgEnv-intel
pip install mpi4py

This example uses mpi4py to say hello from each of 40 cores; the purpose is to write the host names and core numbers to check if Python is spawning workers correctly. If this is not done correctly, all of the workers may end up on one node. Submit using bsub < submit.sh. There were no errors.

Last modified: March 27 2024 01:17:45.