Overview

Compute

Network

Storage

Infrastructure

LIZA

System overview

The Linux Innovation Zone Amsterdam (LIZA) is a computer cluster designed to experiment with various hardware platforms. The cluster-based design of LIZA utilizes a classical batch scheduler called Slurm, which makes it easy for researchers to deploy their experiments and benchmarks. Unlike Snellius, LIZA is not a production cluster, which allows the ETP system administrators to adapt the system's hardware and software configuration to cater to the user's needs. Additionally, LIZA boasts a broader variety of node types and architectures.

Besides the classical cluster computer, LIZA has attached a Liqid composable infrastructure platform. Roughly, it consists of two PCIe boxes with 10 slots each. PCI devices can be installed on the PCI box, and dynamically provisioned to the attached servers. The connection between Liqid and attached servers is done through a host bus adaptor (HBA). Currently, four servers are attached to Liqid platform, allowing up to 20 PCI devices of different types to be dynamically provisioned. Below, the LIZA cluster's nodes are listed, as well as the equipment installed in the Liqid boxes.

Additionally, LIZA is integrated with a Liqid composable infrastructure platform, enabling dynamic allocation of hardware resources. This platform consists of two PCIe expansion boxes, each with 10 slots. These boxes house various PCIe devices that can be installed and dynamically provisioned to attached servers as needed. The servers are connected to the Liqid platform through host bus adaptors (HBAs), which allow the seamless attachment and detachment of PCIe devices. Currently, four servers are connected to the Liqid infrastructure, enabling up to 20 different PCIe devices to be dynamically allocated among them. This setup provides researchers with a highly flexible environment for testing a wide range of hardware configurations and resource combinations.

"Liqid composable infrastructure leverages industry-standard data center components to deliver a flexible, scalable architecture built from pools of disaggregated resources. Compute, networking, storage, GPU, FPGA, and Intel® Optane™ memory devices are interconnected over intelligent fabrics to deliver dynamically configurable bare-metal servers, perfectly sized, with the exact physical resources required by each deployed application.

Our solutions and services enable infrastructure to adapt and approach full utilization. Processes can be automated to realize further efficiencies to address better data demand associated with next-generation applications in AI, IoT deployment, DevOps, Cloud and Edge computing, NVMe- and GPU-over-Fabric (NVMe-oF, GPU-oF) support, and beyond."

Server list

# Nodes	Server	CPU	# Cores	Memory	# GB	Disk	# TB	Devices	Bus	Features	Observations
Owned equipment
4	Dell PowerEdge R760xa	2x Intel Xeon Gold 6526Y	16	16x DDR5-5200 32GB	512			NVIDIA L40S 48GB NVIDIA L40S 48GB NVIDIA L40S 48GB NVIDIA L40S 48GB	PCIe 4.0 x16	hwperf, emerald_rapids, sse4, avx512, gold_6526y, gpu_nvidia
16	Dell PowerEdge T640	2x Intel Xeon Gold 5118	24	12x DDR4-2400 16GB	192	NVMe	1.5	NVIDIA Titan RTX 24GB NVIDIA Titan RTX 24GB NVIDIA Titan RTX 24GB NVIDIA Titan RTX 24GB	PCIe 3.0 x16	hwperf, skylake, sse4, avx512, gold_5118, gpu_nvidia
2	Dell PowerEdge T640	2x Intel Xeon Gold 6230	40	24x DDR4-2933 64GB	1,536	NVMe	1.5	NVIDIA Titan RTX 24GB NVIDIA Titan RTX 24GB	PCIe 3.0 x16	hwperf, skylake, sse4, avx512, gold_6230, gpu_nvidia
1	Dell PowerEdge T640	2x Intel Xeon Gold 6230	40	24x DDR4-2933 64GB	1,536	NVMe	1.5	AMD FPGA Alveo U250 AMD FPGA Versal VCK5000	PCIe 3.0 x16	hwperf, skylake, sse4, avx512, gold_6230, fpga_xilinx
6	Dell PowerEdge C6420	2x Intel Xeon Gold 6230R	52	12x DDR4-2933 32GB	384	NVMe	3.0			hwperf, skylake, sse4, avx512, gold_6230R
1	Lenovo SR650V2	2x Intel Xeon Platinum 8360Y	72	16x DDR4-3200 32GB	512			AMD Instinct M210 64GB	PCIe 4.0 x16	hwperf, skylake, sse4, avx512, platinum_8360, gpu_amd
1	QuantaGrid S74G-2U2	1x NVIDIA Grace	72	LPDDR5X	480	NVMe	1.0	NVIDIA Hopper H100 96 GB	NVILINK C2C PCIe 5.0 x16	nvidia_grace, gpu_nvidia
4	Dell PowerEdge R7515	1x AMD EPYC 7702P	64	16x DDR4-3200 32GB	512	NVMe	6.4	LSI PEX880xx HBA (to Liqid)	PCIe 4.0 x16	--on demand--	Liqid nodes.
2	Liqid SmartStack 10							NextSilicon Maverick v1 NextSilicon Maverick v1 Google Coral TPU G116U Google Coral TPU G116U Google Coral TPU G116U Google Coral TPU G116U AMD Instinct M210 64GB AMD Instinct M210 64GB	PCIe 4.0 x16
Loaned equipment
1		2x AMD Genoa-X 9864	96
1	~~Intel Server D50DNP~~	~~2x Intel Xeon Platinum 8480+~~	~~112~~	~~16x DDR5-4200 32GB~~	~~512~~	~~NVMe~~	~~7.7~~	2x Intel GPU Max 1100 48G XeLink	~~PCIe 5.0 x16~~	~~hwperf, sapphire_rapids, sse4, avx512, platinum_8480, gpu_intel~~	Equipment loaned during Q1 and Q2 of 2024 fruit of longstanding collaboration between Intel and SURF-ETP.

User guide

Connecting to LIZA

To connect to LIZA, you will need to use the SSH protocol, which encrypts all the data and passwords exchanged between your local system and the LIZA system. The way you connect will depend on the type of local system you are using. In all cases, you will access LIZA through one of the login nodes. These are publicly accessible nodes that you use as a stepping stone to work with the batch system and compute nodes.

No long-running processes on login nodes

The login nodes are intended to be used for tasks such as preparing and submitting jobs, checking on the status of running jobs, and transferring data to and from the system. It is not allowed to use the login nodes for running, testing, or debugging processes, as this could negatively impact the experience of other users. To ensure that the login nodes remain usable for everyone, there is an automatic cleanup feature that will terminate processes that consume excessive CPU time or memory.

Open a terminal and type:

LIZA address

ssh <username>@liza.surf.nl

Requesting resources

There is only one partition for Slurm on LIZA which contains all the available nodes. To select specific types of nodes, the Slurm constraint option is used to request nodes with specific features. The table above provides a list of node features for each node type. The example script below demonstrates how to execute Slurm commands on different node types.

Node selection via Slurm constraint

$ srun --constraint=gpu_amd --gpus=1 hostname
srun: job 358 queued and waiting for resources
srun: job 358 has been allocated resources
j14n2.mgt.liza.surf.nl

$ srun --constraint=gpu_intel --gpus=2 hostname
srun: job 359 queued and waiting for resources
srun: job 359 has been allocated resources
j16n1.mgt.liza.surf.nl

It is noted that no default values are applied: you get what you ask for. This means that users are responsible to indicate the required resources using a range of Slurm flags (see Slurm sbatch).

Useful Slurm flags

--nodes=<minnodes>[-maxnodes]|<size_string>
--ntasks=<number>
--cpus-per-task=<ncpus>
--mem=<size>[units]
--gpus=[type:]<number>

Alternatively, you may consider using the --exclusive flag to allocate all CPUs and GRES on requested nodes. Note that by default, the --exclusive flag only allocates as much memory as requested; however, this behavior has been modified on LIZA to allocate all memory as well.

A list of available features can be shown with the scontrol command. Additionally, it is outlined in the table above.

$ scontrol show nodes | grep AvailableFeatures
   AvailableFeatures=hwperf,skylake,sse4,avx512,platinum_8360,gpu_amd
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_5118,gpu_coral
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_6230,fpga_xilinx
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_6230,gpu_nvidia
   AvailableFeatures=hwperf,skylake,sse4,avx512,gold_6230,gpu_nvidia
   AvailableFeatures=nvidia_grace,gpu_nvidia

Software modules

As an experimental platform, LIZA is heterogeneous and dynamic. This dynamism comes together with a major challenge in terms of software installations. Indeed, maintaining an extensive software module ecosystem on a non-production system becomes unfeasible. In this context, the ETP administrators only install and support the latest versions of the vendor's toolchains, drivers, and software developer kits, that is, Intel OneAPI, NVIDIA CUDA, AMD ROCm, Xilinx, and any other vendor's toolchain needed. Apart from that, ETP administrators will not support nor install any other software package or library. Fortunately, the European Environment for Scientific Software Installations (EESSI) introduces a well fitted solution for us. Shortly, the aim of EESSI is to build a common stack of scientific software installations for HPC systems and beyond, including laptops, personal workstations and cloud infrastructure.

Intel

The Intel OneAPI toolkit is installed on all x86_64 nodes and provides compilers, libraries, and tools optimized for high-performance computing. To initialize the environment, use:

$ source /opt/intel/oneapi/setvars.sh

This script sets up environment variables for all available OneAPI components. A typical output looks like:

Initializing Intel OneAPI

:: initializing oneAPI environment ...
   -bash: BASH_VERSION = 5.2.15(1)-release
   args: Using "$@" for setvars.sh arguments:
:: advisor         -- latest
:: ccl             -- latest
:: compiler        -- latest
:: dal             -- latest
:: debugger        -- latest
:: dev-utilities   -- latest
:: dnnl            -- latest
:: dpcpp-ct        -- latest
:: dpl             -- latest
:: ipp             -- latest
:: ippcp           -- latest
:: mkl             -- latest
:: mpi             -- latest
:: pti             -- latest
:: tbb             -- latest
:: umf             -- latest
:: vtune           -- latest
:: oneAPI environment initialized ::

The setvars.sh script configures your session to use the latest installed versions of each component. It is compatible with both batch and interactive shells and ensures consistent access to compilers (e.g., icc, icx, dpcpp), math libraries (MKL), parallel runtimes (TBB, MPI), and analysis tools (VTune, Advisor).

NVIDIA

CUDA is installed on all compute nodes equipped with NVIDIA GPUs. The system maintains the two most recent versions, and a generic symbolic link is provided for convenience (/usr/local/cuda link always points to the latest version available):

NVIDIA CUDA

$ ls -ld /usr/local/cuda*
lrwxrwxrwx  1 root root   22 Dec 18  2023 /usr/local/cuda      -> /etc/alternatives/cuda
lrwxrwxrwx  1 root root   25 Dec 18  2023 /usr/local/cuda-12   -> /etc/alternatives/cuda-12
drwxr-xr-x 15 root root  283 Feb 12  2024 /usr/local/cuda-12.3
drwxr-xr-x 15 root root 4096 Jul 11 13:58 /usr/local/cuda-12.5

NVIDIA provides a set of preconfigured environment modules for the HPC SDK, available under /opt/nvidia/hpc_sdk/modulefiles.

NVIDIA HPC SDK

$ module avail
------------------------------------------------ /opt/nvidia/hpc_sdk/modulefiles -------------------------------------------------
   nvhpc-byo-compiler/24.11           nvhpc-hpcx-cuda12/24.11        nvhpc-hpcx/25.1   (D)    nvhpc/24.11
   nvhpc-byo-compiler/25.1     (D)    nvhpc-hpcx-cuda12/25.1  (D)    nvhpc-nompi/24.11        nvhpc/25.1  (D)
   nvhpc-hpcx-2.20-cuda12/25.1        nvhpc-hpcx/24.11               nvhpc-nompi/25.1  (D)

  Where:
   D:  Default Module

These module files are included in the default MODULEPATH on all GPU-enabled nodes, so users can load them directly with the module load command. The following variants are available:

Module	NVIDIA compilers	MPI stack	Recommended use
nvhpc	✅ Yes	✅ Open MPI	General-purpose development and GPU workloads
nvhpc-hpcx	✅ Yes	✅ HPC-X	Multi-node GPU jobs with InfiniBand and UCX support
nvhpc-nompi	✅ Yes	❌ None	Custom or no MPI environments
nvhpc-byo-compiler	❌ No	❌ None	External compiler toolchains with CUDA/NCCL support

AMD

The AMD ROCm software stack is installed on all compute nodes equipped with AMD GPUs. Multiple ROCm versions may be present, and a generic symbolic link is provided for convenience. The /opt/rocm symlink points to the default ROCm version, which may change over time as new versions are installed. Users can either rely on this generic path or explicitly use a specific version such as /opt/rocm-6.2.0.

AMD ROCm

$ ls -ld /opt/rocm*
lrwxrwxrwx  1 root root   22 Apr 29  2024 /opt/rocm       -> /etc/alternatives/rocm
drwxr-xr-x 35 1003 root 4096 Apr 29  2024 /opt/rocm-5.7.1
drwxr-xr-x  9 root root  122 Sep 11 19:51 /opt/rocm-6.2.0

Available tools under /opt/rocm/bin:

Tool	Description
hipcc	HIP C++ compiler driver (wrapper over clang++)
hipconfig	Show HIP installation info and available platforms
hipify-perl	Convert CUDA source code to HIP
rocminfo	Display GPU devices and supported features
rocm-smi	System management tool (GPU power, temp, ECC)
rocgdb	GDB-based debugger for HIP
rocprof	Lightweight ROCm profiler for kernels and mem

EESSI Environment Modules

The European Environment for Scientific Software Installations (EESSI) is available system-wide on all nodes via the CVMFS file system. EESSI provides a modular, architecture-aware software stack built for HPC and scientific computing. The EESSI software stack is served through the CVMFS endpoint: /cvmfs/software.eessi.io. You can confirm that the repository is mounted with:

$ cvmfs_config stat -v software.eessi.io
Version: 2.13.1.0
...

A default EESSI module is exposed directly under environment modules:

EESSI software module

$ module avail
--------------------------------------------- /cvmfs/software.eessi.io/init/modules ----------------------------------------------
   EESSI/2023.06

$ module load EESSI/2023.06
EESSI/2023.06 loaded successfully

This sets up the EESSI stack silently. The system will automatically detect your node’s CPU architecture and adjust paths accordingly. After loading, additional modules (e.g., compilers, MPI, math libraries) become available in the module avail list.

$ module load OpenMPI
$ mpicxx --version
g++ (GCC) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.

$ which mpicxx
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/skylake_avx512/software/OpenMPI/4.1.6-GCC-13.2.0/bin/mpicxx

Xilinx FPGA

The LIZA system includes one node, j18n2, equipped with two AMD FPGA accelerators. Both cards are connected to the host via PCIe and are available to users as generic resources through Slurm.

Versal ACAP VCK5000
Alveo U250

Accessing FPGA nodes

To run workloads on the FPGA node, request an interactive allocation using salloc. You must specify memory, wall time, and the target FPGA card using the --gres flag:

FPGA card selection via Slurm salloc

$ salloc --mem=10G --gres=fpga:u250:1 -t 2:00:00         # for u250 card
$ salloc --mem=10G --gres=fpga:vck5000:1 -t 2:00:00      # for vck5000 card

After allocation, Slurm will place you directly on the j18n2 node, with the requested FPGA card accessible via PCIe. To confirm that the FPGA is visible to the system:

$ lspci -vd 10ee:
3b:00.0 Processing accelerators: Xilinx Corporation Alveo U250 XDMA Platform
	Subsystem: Xilinx Corporation Device 000e
	Flags: bus master, fast devsel, latency 0, NUMA node 0
	Memory at 38bff2000000 (64-bit, prefetchable) [size=32M]
	Memory at 38bff4040000 (64-bit, prefetchable) [size=256K]
	Capabilities: <access denied>
	Kernel driver in use: xclmgmt
	Kernel modules: xclmgmt

3b:00.1 Processing accelerators: Xilinx Corporation Alveo U250
	Subsystem: Xilinx Corporation Alveo U250
	Flags: bus master, fast devsel, latency 0, IRQ 185, NUMA node 0
	Memory at 38bff0000000 (64-bit, prefetchable) [size=32M]
	Memory at 38bff4000000 (64-bit, prefetchable) [size=256K]
	Memory at 38bfe0000000 (64-bit, prefetchable) [size=256M]
	Capabilities: <access denied>
	Kernel driver in use: xocl
	Kernel modules: xocl

Enabling XRT runtime

To run accelerated applications, you must load the Xilinx Runtime (XRT) environment.

Enabling XRT

$ source /opt/xilinx/xrt/setup.sh
Autocomplete enabled for the xbutil command
Autocomplete enabled for the xbmgmt command
XILINX_XRT        : /opt/xilinx/xrt
PATH              : /opt/xilinx/xrt/bin
LD_LIBRARY_PATH   : /opt/xilinx/xrt/lib
PYTHONPATH        : /opt/xilinx/xrt/python

$ ./<executable_file_name>.xclbin

Depending on the card you’re targeting, set the platform path accordingly:

# For Alveo U250
$ export PLATFORM_REPO_PATHS=/opt/xilinx/platforms/xilinx_u250_gen3x16_xdma_4_1_202210_1

# For Versal VCK5000
$ export PLATFORM_REPO_PATHS=/opt/xilinx/platforms/xilinx_vck5000_gen4x8_qdma_2_202220_1

# Set the library path as well (required by some runtime tools):
$ export LIBRARY_PATH=/usr/lib/x86_64-linux-gnu

Once XRT is active, you can execute your precompiled FPGA application:

$ ./<executable_file_name>.xclbin

Development tools for FPGA programming

LIZA provides a full development stack for building custom FPGA kernels and applications, including:

Vitis
Vivado
Vitis HLS
Model composer

To activate these tools (2022.2 release):

FPGA tools enabling

$ source /opt/xilinx/tools/Vitis/2022.2/settings64.sh
$ source /opt/xilinx/tools/Vivado/2022.2/settings64.sh
$ source /opt/xilinx/tools/Vitis_HLS/2022.2/settings64.sh
$ source /opt/xilinx/tools/Model_Composer/2022.2/settings64.sh

Only one user can access an FPGA card at a time. Use Slurm to avoid contention.
Development tools may require significant local disk space and compile time.
Precompiled .xclbin files must match the platform and tool version.

Space shortcuts

Page tree

Overview

Compute

Network

Storage

Infrastructure

LIZA

System overview

Server list

User guide

Connecting to LIZA

Requesting resources

Software modules

Intel

NVIDIA

AMD

EESSI Environment Modules

Xilinx FPGA

Accessing FPGA nodes

Enabling XRT runtime

Development tools for FPGA programming

Space shortcuts

Page tree

Available technologies

Overview

Compute

Network

Storage

Infrastructure

LIZA

System overview

Server list

User guide

Connecting to LIZA

Requesting resources

Software modules

Intel

NVIDIA

AMD

EESSI Environment Modules

Xilinx FPGA

Accessing FPGA nodes

Enabling XRT runtime

Development tools for FPGA programming