On many of our HPC systems a modules environment is used to provide and manage available software. This page provides an overview of how modules work, and how you can leverage them.
In order to use or create certain software packages, often the environment (i.e. the environment variables) have to be adapted. For example the
PATH environment variable that is used to locate programs, and the
PATH environment variable that is used to locate shared libraries.
By using environment modules (Lmod on Snellius ) is an attempt to simplify the user's experience within each cluster. Within this tutorial we will provide an example that describes the compilation of a program (called
myprogram.c that uses NETCDF (a library to store and retrieve scientific data). The NETCDF package consists of several libraries and a few programs.
An example without the use of modules
In this example we will only use one library (libnetcdf.a) and one program (ncdump).
In order to compile the program, one has to locate the place where the netcdf library is situated and where the include files are situated. Let us assume that these locations are respectively:
Compilation of the program
This is not too complicated, but one can imagine that compiling a program that uses many libraries can become cumbersome, and that problems arise when system managment decides to place the libraries in other places.
To use the program
ncdump that comes with the NETCDF package, one would have two options: The first one, calling the program using it's full PATH:
Or extending the PATH and calling the program by its name:
The same example, but now using modules
By loading the proper module, everything becomes much more simpler:
First check which versions of the NETCDF package are available on the system by issuing the following command
module avail netcdf, will display the available modules (for the all of the installed netcdf software) on the system. In this case we will choose netcdf version 4.8.0 which was compiled using the 2022 Compiler toolchain with GCC and OpenMPI.
On Snellius it is no longer necessary to know where the NETCDF stuff is located, the only thing to remember is to issue the
module load command. The module will handle everything to properly locate programs and shared libraries.
How modules work
module load netCDF/4.9.0-gompi-2022a command modifies some environment variables:
PATHvariable is extended, so that the program
ncdumpcan be found
- Some environment variables are extended with the location of the include and library locations
This explains how the program
ncdump can be found. To make it possible that the simple compile command works, the
cc command has to know how to handle the environment variables that define the location of the include and library directories. To make that possible, a wrapper has been written that translates this information and calls the 'real' compiler as follows with the appropriate
-L flags. So cc is not the compiler itself, but a script that calls the 'real' compiler with the correct flags.
Most used module commands
lists the modules that are available
module load modulename
sets the environment up to use modulename
module add modulename
same as previous
module display modulename
shows what is done when you issue the command
module add modulename
module unload modulename
removes the environment neccessary for modulename
module rm modulename
same as previous
shows the loaded modules
tells how to use the module command
The module spider command reports all the modules that can be loaded on a system
Clear the currently set of loaded modules
- Users need to specify the full module names including version number in order to properly load the modules. Default modules are disabled.
- Example: In order to load Python as a module you need to specify the version i.e. module load Python/3.10.4-GCCcore-11.3.0-bare
- Users are able to specify a partial match of a version.
- Example: So abc/17 will try to match the “best” abc/17.*.*
module spiderwill use case independent sorting.
Caveats using modules
- Do not use the module command in login scripts (
.bashrc), unless really necessary. Load modules where necessary, for example in job scripts. For interactive use, place these commands in a file that you source at the beginning of a session. Placing module commands in login scripts can be the cause of a job that will not run, while it ran fine a month ago, because some minor changes in the login scripts were applied.
- Use only the modules you need, and understand why they are needed.
Often, the order in which modules are loaded is important, for example to create a good
PATHvariable. It is good to realize that loading a module that is already loaded has no effect: the module system 'remembers' which modules are already loaded. So, to reliably put the path of package
'one'before that of package
'two', irrespective of packages already loaded, specify:
Modules and shells
module command is shell-independent, but the implementation is shell-dependent:
- bash: module is an exported shell function
- csh/tcsh: module is an alias
- ksh: module is a shell function
If you work interactively a lot and don't want to have to type full module names (and thus remember versions) it is possible to define module aliases. On systems where LMOD is used as the module system (i.e. Snellius ) you can create a file
.modulerc.lua in your home directory and use it to define aliases. For example:
These aliases can then be used instead of their full module names:
The aliases defined are listed by the
module avail command as Global Aliases:
Some things to note, as an alias is merely a different name for a full module name, and thus do not provide any "smart" behaviour:
- In the above example the
2022environment still needed to be loaded first, as otherwise the alias
pywould not resolve, as the aliased Python version is part of the 2022 environment
- Aliases are not updated automatically if new versions of a package are installed, nor do they attempt to find a different version if the linked one is missing