Synopsis
On many of our HPC systems a modules environment is used to provide and manage available software. This page provides an overview of how modules work, and how you can leverage them.
Introduction
In order to use or create certain software packages, often the environment (i.e. the environment variables) have to be adapted. For example the PATH
environment variable that is used to locate programs, and the LD_RUN_
PATH
environment variable that is used to locate shared libraries.
By using environment modules (Lmod on Snellius and Lisa) is an attempt to simplify the user's experience within each cluster. Within this tutorial we will provide an example that describes the compilation of a program (called myprogram.c
that uses NETCDF (a library to store and retrieve scientific data). The NETCDF package consists of several libraries and a few programs.
An example without the use of modules
In this example we will only use one library (libnetcdf.a) and one program (ncdump).
In order to compile the program, one has to locate the place where the netcdf library is situated and where the include files are situated. Let us assume that these locations are respectively:
- /usr/local/netcdf/lib
- /usr/local/netcdf/include
Compilation of the program myprog.c
:
cc -o myprof myprog.c -I/usr/local/netcdf/include -L/usr/local/netcdf/lib -l netcdf
This is not too complicated, but one can imagine that compiling a program that uses many libraries can become cumbersome, and that problems arise when system managment decides to place the libraries in other places.
To use the program ncdump
that comes with the NETCDF package, one would have two options: The first one, calling the program using it's full PATH:
/usr/local/netcdf/bin/ncdump
Or extending the PATH and calling the program by its name:
PATH=/usr/local/netcdf/bin:$PATH ncdump
The same example, but now using modules
By loading the proper module, everything becomes much more simpler:
First check which versions of the NETCDF package are available on the system by issuing the following command
module load 2022 module avail netcdf
module avail netcdf
, will display the available modules (for the all of the installed netcdf software) on the system. In this case we will choose netcdf version 4.8.0 which was compiled using the 2022 Compiler toolchain with GCC and OpenMPI.
module load netCDF/4.9.0-gompi-2022a cc -o prog.c -lnetcdf ncdump
On Snellius and Lisa it is no longer necessary to know where the NETCDF stuff is located, the only thing to remember is to issue the module load
command. The module will handle everything to properly locate programs and shared libraries.
How modules work
The module load netCDF/4.9.0-gompi-2022a
command modifies some environment variables:
- The
PATH
variable is extended, so that the programncdump
can be found - Some environment variables are extended with the location of the include and library locations
This explains how the program ncdump
can be found. To make it possible that the simple compile command works, the cc
command has to know how to handle the environment variables that define the location of the include and library directories. To make that possible, a wrapper has been written that translates this information and calls the 'real' compiler as follows with the appropriate -I
and -L
flags. So cc is not the compiler itself, but a script that calls the 'real' compiler with the correct flags.
Most used module commands
module avail
lists the modules that are availablemodule load modulename
sets the environment up to use modulenamemodule add modulename
same as previousmodule display modulename
shows what is done when you issue the commandmodule add modulename
module unload modulename
removes the environment neccessary for modulenamemodule rm modulename
same as previousmodule list
shows the loaded modulesmodule help
tells how to use the module commandmodule spider
The module spider command reports all the modules that can be loaded on a systemmodule purge
Clear the currently set of loaded modules
Features
- Users need to specify the full module names including version number in order to properly load the modules. Default modules are disabled.
- Example: In order to load Python as a module you need to specify the version i.e. module load Python/3.10.4-GCCcore-11.3.0-bare
- Users are able to specify a partial match of a version.
- Example: So abc/17 will try to match the “best” abc/17.*.*
module avail
andmodule spider
will use case independent sorting.
Caveats using modules
- Do not use the module command in login scripts (
.bash_profile
,.bashrc
), unless really necessary. Load modules where necessary, for example in job scripts. For interactive use, place these commands in a file that you source at the beginning of a session. Placing module commands in login scripts can be the cause of a job that will not run, while it ran fine a month ago, because some minor changes in the login scripts were applied. - Use only the modules you need, and understand why they are needed.
Often, the order in which modules are loaded is important, for example to create a good
PATH
variable. It is good to realize that loading a module that is already loaded has no effect: the module system 'remembers' which modules are already loaded. So, to reliably put the path of package'one'
before that of package'two'
, irrespective of packages already loaded, specify:module unload one two module load two one
Modules and shells
The module
command is shell-independent, but the implementation is shell-dependent:
- bash: module is an exported shell function
- csh/tcsh: module is an alias
- ksh: module is a shell function
Module aliases
If you work interactively a lot and don't want to have to type full module names (and thus remember versions) it is possible to define module aliases. On systems where LMOD is used as the module system (i.e. Snellius and Lisa) you can create a file .modulerc.lua
in your home directory and use it to define aliases. For example:
snellius paulm@int1 12:27 ~$ cat ~/.modulerc.lua module_alias("py", "Python/3.10.4-GCCcore-11.3.0") module_alias("cm", "CMake/3.23.1-GCCcore-11.3.0")
These aliases can then be used instead of their full module names:
snellius paulm@int1 12:37 ~$ module load 2022 snellius paulm@int1 12:37 ~$ module load py snellius paulm@int1 12:37 ~$ module list Currently Loaded Modules: 1) 2022 7) libreadline/8.1.2-GCCcore-11.3.0 13) OpenSSL/1.1 2) GCCcore/11.3.0 8) Tcl/8.6.12-GCCcore-11.3.0 14) Python/3.10.4-GCCcore-11.3.0 3) zlib/1.2.12-GCCcore-11.3.0 9) SQLite/3.38.3-GCCcore-11.3.0 15) cURL/7.83.0-GCCcore-11.3.0 4) binutils/2.38-GCCcore-11.3.0 10) XZ/5.2.5-GCCcore-11.3.0 16) libarchive/3.6.1-GCCcore-11.3.0 5) bzip2/1.0.8-GCCcore-11.3.0 11) GMP/6.2.1-GCCcore-11.3.0 17) CMake/3.23.1-GCCcore-11.3.0 6) ncurses/6.3-GCCcore-11.3.0 12) libffi/3.4.2-GCCcore-11.3.0
The aliases defined are listed by the module avail
command as Global Aliases:
snellius paulm@int1 12:38 ~$ module avail ---------------------------------------------------------- Global Aliases ---------------------------------------------------------- cm -> CMake/3.23.1-GCCcore-11.3.0 py -> Python/3.10.4-GCCcore-11.3.0 ...
Some things to note, as an alias is merely a different name for a full module name, and thus do not provide any "smart" behaviour:
- In the above example the
2022
environment still needed to be loaded first, as otherwise the aliaspy
would not resolve, as the aliased Python version is part of the 2022 environment - Aliases are not updated automatically if new versions of a package are installed, nor do they attempt to find a different version if the linked one is missing