Nodes Overview
Node types
The set of Snellius node available to end-users comprises three interactive nodes and a large number of batch nodes, or "worker nodes". We distinguish the following different node flavours:
(int)
: CPU-only interactive nodes ,(tcn)
: CPU-only"thin"
compute nodes, some of which have truly node-local NVMe based scratch space(fcn)
: CPU-only"fat"
compute nodes which have more memory than the default worker nodes as well as truly node-local NVMe based scratch space,(hcn)
: CPU-only"high-memory"
compute nodes with even more memory than fat nodes,(gcn)
: GPU-enhanced"gpu"
compute nodes with NVIDIA GPUs, some of which have truly node-local NVMe based scratch space,(srv)
: CPU-only not-for-computing"service"
nodes, that are primarily intended to facilitate the running of user-submitted jobs that automate data transfers into or out of the Snellius system.
The table below lists the current available Snellius node types.
# Nodes | Node flavour | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | CPU Memory per node | Local storage | Network connectivity |
---|---|---|---|---|---|---|---|---|---|
3 | int | ThinkSystem SR665 | AMD EPYC 7F32 (2x) 8 Cores/Socket | 16 | N/A | 16 x 16GiB |
|
|
|
525 | tcn | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 16GiB |
| A subset of 21 nodes contain:
|
|
738 | tcn | ThinkSystem SD665v3 | AMD Genoa 9654 (2x) 96 Cores/Socket | 192 | N/A | 24 x 16GiB |
| A subset of 72 nodes contain:
|
|
72 | fcn (Rome) | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 64GiB |
|
|
|
48 | fcn (Genoa) | ThinkSystem SD665v3 | AMD Genoa 9654 (2x) 96 Cores/Socket | 192 | N/A |
|
|
|
|
2 | hcn | ThinkSystem SR665 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 32 x 128GiB 2666 MHz, DDR4 |
| N/A |
|
2 | hcn (8 TiB) | ThinkSystem SR665 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 32 x 256GiB |
| N/A |
|
72 | gcn | ThinkSystem SD650-N v2 | Intel Xeon Platinum 8360Y (2x) 36 Cores/Socket | 72 | NVIDIA A100 (4x) 40 GiB HBM2 memory with 5 active memory stacks per GPU | 16 x 32 GiB |
| A subset of 36 nodes contain:
|
|
88 | gcn | ThinkSystem SD665-N V3 | AMD EPYC 9334 (2x) 32 Cores/Socket | 64 | NVIDIA H100 (SXM5) (4x) | 24 x 32 GiB |
| A subset of 22 nodes contain:
|
|
7 | srv | ThinkSystem SR665 | AMD EPYC 7F32 (2x) 8 Cores/Socket | 16 | N/A | 16 x 16GiB 3200MHz, DDR5 |
|
|
|
Nodes per expansion phase
Snellius is planned to be built in three consecutive expansion phases. All phases are planned to be in operation until end of life of the machine. Since Snellius will grow in phases, it will become increasingly heterogeneous when phase 2 and phase 3 will be operational. In order to maintain a clear reference to node flavours i.e. int, tcn, gcn, we will introduce a node type acronym. This will account for the node flavour along with which phase the node was implemented in (PH1, PH2, PH3). A thin CPU-only node that was implemented in phase 1 will follow the Node Type Acronym PH1.tcn.
Phase 1 (Q3 2021)
The table below, lists the available Snellius node types available in Phase 1.
# Nodes | Node Flavour | Node Type Acronym | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Memory per node | Local storage | Network connectivity |
---|---|---|---|---|---|---|---|---|---|---|
3 | int | PH1.int | ThinkSystem SR665 | AMD EPYC 7F32 (2x) 8 Cores/Socket | 16 | N/A | 16 x 16GiB |
|
|
|
504 | tcn | PH1.tcn | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 16GiB |
| N/A |
|
72 | fcn | PH1.fcn | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 64GiB |
|
|
|
2 | hcn | PH1.hcn4T | ThinkSystem SR665 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 32 x 128GiB 2666 MHz, DDR4 |
| N/A |
|
2 | hcn | PH1.hcn8T | ThinkSystem SR665 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 32 x 256GiB |
| N/A |
|
36 | gcn | PH1.gcn | ThinkSystem SD650-N v2 | Intel Xeon Platinum 8360Y (2x) 36 Cores/Socket | 72 | NVIDIA A100 (4x) 40 GiB HBM2 memory with 5 active memory stacks per GPU | 16 x 32 GiB |
| N/A |
|
7 | srv | PH1.srv | ThinkSystem SR665 | AMD EPYC 7F32 (2x) 8 Cores/Socket | 16 | N/A | 16 x 16GiB 3200MHz, DDR5 |
|
|
|
Phase 1A + 1B + 1C (Q4 2022)
# Nodes | Node Flavour | Node Type Acronym | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Memory per node | Local storage | Network connectivity |
---|---|---|---|---|---|---|---|---|---|---|
21 | tcn | ThinkSystem SR645 | AMD Rome 7H12 (2x) 64 Cores/Socket | 128 | N/A | 16 x 16GiB |
|
|
| |
36 | gcn | ThinkSystem SD650-N v2 | Intel Xeon Platinum 8360Y (2x) 36 Cores/Socket 2.4 GHz (Speed Select SKU) 250W | 72 | NVIDIA A100 (4x) 40 GiB HBM2 Memory with 5 active memory stacks per GPU | 16 x 32GiB |
|
|
|
Phase 2 (Q3 2023)
# Nodes | Node Flavour | Node Type Acronym | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Memory per node | Local storage | Network connectivity |
---|---|---|---|---|---|---|---|---|---|---|
714 | tcn | ThinkSystem SD665v3 | AMD Genoa 9654 (2x) 96 Cores/Socket | 192 | N/A | 24 x 16GiB |
| N/A |
|
Phase 2A (LISA replacement, Q3 2023)
# Nodes | Node Flavour | Node Type Acronym | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Memory per node | Local storage | Network connectivity |
---|---|---|---|---|---|---|---|---|---|---|
72 | tcn | ThinkSystem SD665v3 | AMD Genoa 9654 (2x) 96 Cores/Socket | 192 | N/A | 24 x 16GiB |
|
|
|
Phase 3 (Q2 2024)
# Nodes | Node Flavour | Node Type Acronym | Lenovo Node Type | CPU SKU | CPU Cores per Node | Accelerator(s) | DIMMs | Memory per node | Local storage | Network connectivity |
---|---|---|---|---|---|---|---|---|---|---|
88 | gcn | ThinkSystem SD665-N V3 | AMD EPYC 9334 (2x) 32 Cores/Socket | NVIDIA H100 (SXM5) (4x) | 24 x 32 GiB |
| A subset of 22 nodes contain:
|
|
Interconnect
All compute nodes on Snellius use the same interconnect, which is based on Infiniband HDR100 (100Gbps), fat tree topology.
With phase 2 and phase 3 extensions added, there is also a single InfiniBand fabric, but part of it is based on InfiniBand NDR, to connect the older tree and the new tree with sufficient bandwidth,