Here we list the issues that are known to us and that you don't have to report to the Service Desk. Of course, if you encounter issues on Snellius not listed here then please let us know (through the Service Desk).
Hardware
Scratch filesystem performance
The GPFS scratch file system is not performing as expected. Reading/writing large files performs as expected, but reading/writing many small files is a lot slower than expected. Currently the advise to the users is that, if you are assigned a project space which is the user managed scratch (/projects/0/<your_project_space_dir
), please use it rather than the shared scratch space (/scratch-shared/<your_dir_name>
).
Our system admins are working to resolve this issue.
Applications
NCCL init hangs/crashes for GPU <=> GPU communication
NCCL is a communication library that offers optimized primitives for inter-GPU communication. We have found that it often hangs during initialization on Snellius. Probability of a hang during init increases with the amount of GPUs in the allocation. The issue is that NCCL sets up its communication using an ethernet-based network interface. By default, it selects the 'ib-bond0' interface, which supports IP over the infiniband network in Snellius. This interface seems to be experiencing issues however.
As a workaround, you can configure NCCL to use the traditional ethernet interface, by exporting the following environment variable
export NCCL_SOCKET_IFNAME="eno"
Note that if you use mpirun as launcher, you should make sure that it gets exported to the other nodes in the job too
mpirun -x NCCL_SOCKET_IFNAME <my_executable_using_nccl>
(note that when launching your parallel application with srun, your environment gets exported automatically, so the 2nd step is not needed).
Impact on performance of this workaround is expected to be minimal: the traditional ethernet interface is only used to initialize the connection. Any further NCCL communication between nodes is performed using native infiniband.
Cartopy: ibv_fork_init() warning
Users can encounter the following warning message, when import "cartopy" and "netCDF" modules in Python:
>>> import netCDF4 as nc4 >>> import cartopy.crs as ccrs [1637231606.273759] [tcn1:3884074:0] ib_md.c:1161 UCX WARN IB: ibv_fork_init() was disabled or failed, yet a fork() has been issued. [1637231606.273775] [tcn1:3884074:0] ib_md.c:1162 UCX WARN IB: data corruption might occur when using registered memory.
The issue is similar to the one reported here. The warning will disappear if "cartopy" is imported before "netCDF".
Another solution is to disable OFI before running the python script:
$ export OMPI_MCA_btl='^ofi' $ export OMPI_MCA_mtl='^ofi'
Tooling
Attaching to a process with GDB can fail
When using gdb -p <pid
> (or the equivalent attach <pid>
command in gdb) to attach to a process running in a SLURM job, you might encounter errors or warnings related to executable and library files than cannot be opened:
snellius paulm@gcn13 09:44 ~$ gdb /usr/bin/sleep -p 1054730 GNU gdb (GDB) Red Hat Enterprise Linux 8.2-15.el8 ... Reading symbols from /usr/bin/sleep...Reading symbols from .gnu_debugdata for /usr/bin/sleep...(no debugging symbols found)...done. (no debugging symbols found)...done. Attaching to program: /usr/bin/sleep, process 1054730 Error while mapping shared library sections: Could not open `target:/lib64/libc.so.6' as an executable file: Operation not permitted Error while mapping shared library sections: Could not open `target:/lib64/ld-linux-x86-64.so.2' as an executable file: Operation not permitted
Such issues will also prevent symbols from being resolved correctly, making debugging really difficult.
The reason that this happens is that processes in a SLURM job get a slightly different view of file system mounts (using a so-called namespace). When you want to attach GDB to a running process and use SSH to log into the node where the process is running, the gdb
process will not be in the same namespace, causing GDB to have issues to directly access the binary (and its libraries) you're trying to debug.
The workaround is to use a slightly different method for attaching to the process:
$ gdb <executable>
(gdb) set sysroot /
(gdb) attach <pid>
For the example above, to attach to /usr/bin/sleep
(PID 1054730) the steps would become:
# Specify the binary to attach to, so GDB can resolve its symbols snellius paulm@gcn13 09:50 ~$ gdb /usr/bin/sleep GNU gdb (GDB) Red Hat Enterprise Linux 8.2-15.el8 ... Reading symbols from /usr/bin/sleep...Reading symbols from .gnu_debugdata for /usr/bin/sleep...(no debugging symbols found)...done. (no debugging symbols found)...done. Missing separate debuginfos, use: yum debuginfo-install coreutils-8.30-8.el8.x86_64 # Tell GDB to assume all files are available under / (gdb) set sysroot / # Attach to the running process (gdb) attach 1055415 Attaching to program: /usr/bin/sleep, process 1055415 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. 0x0000153fd299ad68 in nanosleep () from /lib64/libc.so.6 (gdb) bt #0 0x0000153fd299ad68 in nanosleep () from /lib64/libc.so.6 #1 0x000055e495e8cb17 in rpl_nanosleep () #2 0x000055e495e8c8f0 in xnanosleep () #3 0x000055e495e89a58 in main () (gdb)
Batch system
Allocating multiple GPU nodes
Normally, batch scripts like
#!/bin/bash #SBATCH -p gpu #SBATCH -n 8 #SBATCH --ntasks-per-node=4 #SBATCH --gpus=8 #SBATCH -t 20:00 #SBATCH --exclusive module load ... srun <my_executable>
Should get you an allocation with 2 GPU nodes, 8 gpus, and 4 MPI tasks per node. However, right now, there is an issue related to specifying an amount of GPUs larger than 4: jobs with the above SBATCH arguments that use OpenMPI and call srun or mpirun will hang.
Instead of specifying the total number of GPUs, please specify the number of GPUs per node, combined with the number of nodes instead. E.g.
#!/bin/bash #SBATCH -p gpu #SBATCH -N 2 #SBATCH --ntasks-per-node=4 #SBATCH --gpus-per-node=4 #SBATCH -t 20:00 #SBATCH --exclusive module load ... srun <my_executable>
This will give you the desired allocation with a total of 2 GPU nodes, 8 gpus, and 4 MPI tasks per node, and the srun (or mpirun) will not hang.
Intermittent Slurm bug
- A Slurm bug has been identified after the last upgrade (Feb. 28 2025).
The SchedMD team is working on it and will provide a bug fix soon. In rare cases jobs are cancelled due to a "NODE FAILURE" error. We apologize for the inconvenience and reiterate that node failures are always reimbursed automatically.
MPI
OpenMPI (version: OpenMPI/4.1.5-GCC-12.3.0) on Software Stack 2023 more performant than OpenMPI (version: OpenMPI/5.0.3-GCC-13.3.0 ) on Software stack 2024, for some use cases.
We are working to get a complete characterization of this issue and how to get comparable performances.
MPI Communication Issues
We are currently experiencing intermittent communication issues between nodes when executing MPI jobs.
Error Example:
ORTE has lost communication with a remote daemon.
We are addressing and working towards resolving this issue. If you are experiencing similar problems, please submit a ticket for investigation. Upon verification that the issue is related to this communication problem, you will be eligible for reimbursement of the SBUs spent.
IntelMPI
Currently (Nov 2024) we have been seeing intermittent failures with IntelMPI jobs. We are currently investigating the issue. The failure might be of the following:
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(176)........: MPID_Init(1546)..............: MPIDI_OFI_mpi_init_hook(1558): create_vni_context(2135).....: OFI endpoint open failed (ofi_init.c:2135:create_vni_context:Invalid argument)
We have two possible workarounds:
- Use OpenMPI
- Try these environment variables
export I_MPI_OFI_PROVIDER=verbs export I_MPI_COLL_EXTERNAL=0 export I_MPI_FABRICS=shm:ofi
We will update here when we have more information
I am getting the following error when I run my MPI job
srun: error: Couldn't find the specified plugin name for mpi/pmix_v2 looking at all files srun: error: cannot find mpi plugin for mpi/pmix_v2 srun: error: MPI: Cannot create context for mpi/pmix_v2 srun: error: MPI: Unable to load any plugin srun: error: Invalid MPI type 'pmix_v2', --mpi=list for acceptable types
This error occurs when one tries to use srun
along with a process management interface (PMI) version that is not available. The reason for the non-availability could be that the pmi
version was upgraded recently. The user can also force a particular pmix
version to be used within their application by using the execution command in the following manner:
srun --mpi=pmix_v2 <rest of the command>
In the above case, pmix_v2
is not available anymore.
Solution
The best way to use srun
without the --mpi
option or yet still, if you want to force pmix
usage, do not specify the version, scheduler will choose the latest version that is installed:
srun --mpi=pmix <rest of the command>
If you want to list the pmi
versions that are available, you can do that by executing the following on the command line:
$ srun --mpi=list MPI plugin types are... none pmi2 pmix specific pmix plugin versions available: pmix_v4
Some background regarding Process Management Interface (PMI):
PMI provides an API and a library which interacts with different MPI libraries via the API to facilitate inter process communication. PMI libraries typically store processor/rank information in the form of a database which the MPI libraries can query and perform communication. For further reading please refer to: https://docs.openpmix.org/en/latest/history.html and https://link.springer.com/chapter/10.1007/978-3-642-15646-5_4 and https://dl.acm.org/doi/pdf/10.1145/3127024.3127027 .