Synopsis

This page gives an overview of the Snellius supercomputer and details the various types of file systems, nodes, and system services available to end-users.

System overview

Snellius is the Dutch national supercomputer. Snellius is a general purpose capability system and is designed to be a well balanced system. If you need one or more of: many cores, large symmetric multi-processing nodes, high memory, a fast interconnect, a lot of work space on disk, or a fast I/O subsystem then Snellius is the machine of choice.

Node types 

Snellius is planned to be built in three consecutive expansion phases. All phases are planned to be in operation until end of life of the machine. Since Snellius will grow in phases, it will become increasingly heterogeneous when phase 2 and phase 3 will be operational. In order to maintain a clear reference to node flavours i.e. int, tcn, gcn, we will introduce a node type acronym. This will account for the node flavour along with which phase the node was implemented in (PH1, PH2, PH3). A thin CPU-only node that was implemented in phase 1 will follow the Node Type Acronym PH1.tcn. 

The set of Snellius node available to end-users comprises three interactive nodes and a large number of batch nodes, or "worker nodes".  We distinguish the following different node flavours:

  • CPU-only interactive nodes (int),
  • CPU-only "thin" compute nodes (tcn),
  • GPU-enhanced compute nodes with NVIDIA GPUs (gcn),
  • CPU-only "fat" compute nodes (fcn) which have more memory than the default worker nodes as well as truly node-local NVMe based scratch space,
  • CPU-only high-memory compute nodes (hcn) with even more memory than fat nodes,
  • CPU only not-for-computing "service" nodes (srv),  that are primarily intended to facilitate the running of user-submitted jobs that automate data transfers into or out of the Snellius system.

Phase 1 (Q3 2021)

The table below, lists the available Snellius node types available in Phase 1.

# Nodes

Node Flavour

Node Type Acronym

Lenovo Node Type

CPU SKU

CPU Cores  per Node

Accelerator(s)


DIMMs 

Total memory per node, (per core)

Other characteristics

3intPH1.intThinkSystem SR665

AMD EPYC 7F32 (2x),

8 Cores/Socket, 3.7GHz, 180W

16N.A.

16x16GiB, 3200MHz, DDR4


256 GiB, (16 GiB)
  • 6.4TB NVMe SSD Intel P5600
  • 1xHDR100, 100GbE ConnectX-6 VPI Dual port
  • 2x25GbE SFP28 Mellanox OCP
504

tcn

PH1.tcn

ThinkSystem SR645

AMD Rome 7H12 (2x),

64 Cores/Socket, 2.6GHz, 280W

128N.A.

16x16GiB, 3200MHz, DDR4 

256 GiB, (2 GiB) 
  • 1xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 OCP
72fcnPH1.fcn

ThinkSystem SR645

AMD Rome 7H12 (2x),

64 Cores/Socket, 2.6GHz, 280W

128N.A.

16x64GiB, 3200MHz, DDR4 

1 TiB, (8 GiB)
  • 6.4TB NVMe SSD Intel P5600
  • 1xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 OCP
2

hcn

PH1.hcn4TThinkSystem SR665

AMD Rome 7H12 (2x),

64 Cores/Socket, 2.6GHz, 280W

128N.A.

32x128GiB,

2666 MHz, DDR4 

4 TiB, (32 GiB)
  • 1xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 OCP
2hcnPH1.hcn8TThinkSystem SR665

AMD Rome 7H12 (2x),

64 Cores/Socket, 2.6GHz, 280W

128N.A.

32x256GiB,

2666 MHz, DDR4 

8 TiB, (64 GiB)
  • 1xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 OCP
36gcnPH1.gcnThinkSystem SD650-N v2

Intel Xeon Platinum 8360Y (2x),

36 Cores/Socket, 2.4 GHz (Speed Select SKU), 250W


72

NVIDIA A100 (4x),

40 GiB HMB2 memory with 5 active memory stacks per GPU

16x32 GiB,

3200 MHz, DDR4

  • 512GiB
  • 160GiB HMB2

(7.111 GiB)

  • 2xHDR100 ConnectX-6 single port
  • 2x25GbE SFP28 LOM
  • 1x1GbE RJ45 LOM
7srvPH1.srvThinkSystem SR665

AMD EPYC 7F32 (2x)

8 Cores/Socket, 3.7GHz, 180W

16N.A.16x16GiB 3200MHz, DDR5256 GiB, (16 GiB)
  • 6.4TB NVMe SSD Intel P5600
  • 1xHDR100, 100GbE ConnectX-6 VPI Dual port
  • 2x25GbE SFP28 Mellanox OCP

Phase 2 (Q3 2022)

An extension will be added with more CPU-only thin nodes (future generation AMD EPYC processors, 2 GB per core), with a peak performance of 5.1 PFLOP/s.

Phase 3 (Q3 2023)

There are three options for this extension:

  1. CPU thin nodes (same future generation AMD EPYC processors, aggregate: 2.4 PFLOP/s), or
  2. GPU nodes (future generation NVIDIA GPUs, aggregate: 10.3 PFLOP/s), or
  3. Storage (the amount still needs to be determined)

The choice will be made 1.5 years after the start of production of Phase 1 and will be based on actual usage and demand of the system.

When Phase 3 is complete Snellius will have a total performance (CPU+GPU) in the range 13.6 - 21.5 PFLOP/s. 

File systems

There are several filesystems available on Snellius:

File systemQuota (space)Quota (files)SpeedShared between nodesMount pointExpirationBackup
Home200 GiBnoneNormalYes/home/<username>15 weeks after project expirationNightly incremental
Scratch-local8 TiB
(counted over all scratch-* space used)
3.000.000 (soft limit)FastNo/scratch-local/<username>Files older than 6 days are removed automaticallyNo backup
Scratch-shared8 TiB
(counted over all scratch-* space used)
3.000.000 (soft limit)FastYes/scratch-shared/<username>Files older than 14 days are removed automaticallyNo backup
Projectbased on requestdependent on size of project space (see below)NormalYes

/project/<project_name>

Project durationNo backup
Archive Servicebased on requestbased on requestVery slowYes/archive/<username>Project durationNightly

The home file system

Every user has their own home directory, which is accessible at /home/<login_name> . Your home directory has default capacity quota of 200 GiB (i.e. 200 x 230 bytes). No quota on number of files and directories is enforced.

The 200 GiB home directory is ample space for a work environment on the system for most users. But our helpdesk can be contacted if you think that it is not sufficient to accommodate your work environment on Snellius. Note, however, that home directories are not intended for long term storage of large data sets. SURF provides the archive facility for that. Home directories are neither suitable for fast, large scale or parallel I/O. Use scratch and/or project space (see below) for fast and voluminous job I/O.

SURF provides a versioned incremental backup service for your home directory that is run overnight. Files that have been backed up are retained in the backup repository for three weeks after they have been deleted from the file system. We can restore files and/or directories when you accidentally remove them or overwrite them with the wrong contents, or when they are corrupted because of some storage hardware failure – provided, of course, that a version already existed and was successfully backed up. Note that no consistent backup can be created of files that are removed, being changed, truncated, or appended to, while the backup process is running. The backup process will therefore simply skip files that are opened and in use by other processes.

To have a file successfully backed up:

  • the file must reside on the file system when the backup runs
  • the file must be closed

The scratch file system

The scratch file system is intended as fast temporary storage that can be used while running a job, and can be accessed by all users with a valid account on the system. 

Scratch automatic cleanup and lack of backup

For scratch space there is an automated expiration policy of 6/14 days (for scratch-local / scratch-shared). Files and directories that are older, i.e. that have not changed their contents for this duration, are removed automatically.

There is no guarantee however that files are actually retained for at least 6/14 days. Serious hardware failure, for example, could cause loss of files that have not reached that age.

SURF provides no backup service on scratch space. Job end results, or any other precious job output that you want to keep, must be copied in time to your home directory, to the SURF archive facility, or to an off-site storage facility of your choice.

A user's default scratch space capacity quota is 8 TiB (i.e. 8 x 240 bytes), which is counted over all data usage of scratch-local and scratch-shared of the user.

The i-node quota (number of files and directories per user) is set at a soft limit of 3 million files per user, and a hard limit that is set substantially higher. Most of our users will never hit the soft limit ceiling, as there is a programmed cleanup of files that are older than 6 days (on scratch-local) or 14 days (on scratch-shared). Users that produce enormous amounts of files per job may have to clean up files and directories themselves after the job, as they could reach their quota before the automatic cleanup is invoked.

If the soft limit is reached, a grace period of 7 days starts counting down. If you clean-up within the grace period, and do not grow to reach the hard limit, you will not notice anything of the limit. If the hard limit is reached or if you fail to clean up to get a usage below the soft limit in due time, the file system refuses to create new files and directories for you.

Accessing scratch space

Scratch space can be accessed on all nodes from two locations:

  • /scratch-local/  
  • /scratch-shared/

/scratch-local/ specifies a unique location on each node (and so acts like it is local), whereas /scratch-shared/ denotes the same location on every node:

# Different content for /scratch-local/paulm, depending on the node

snellius paulm@int1 14:26 ~$ ls -l /scratch-local/paulm
total 0
-rw-rw---- 1 paulm paulm 0 Mar  3 14:26 hello.txt 

snellius paulm@int3 14:26 ~$ ls -l /scratch-local/paulm/
total 0
 
# Same content for /scratch-shared/paulm

snellius paulm@int1 14:26 ~$ ls -l /scratch-shared/paulm/
total 4
drwxr-sr-x 2 paulm paulm 4096 Feb 17 22:16 Blender

snellius paulm@int3 14:26 ~$ ls -l /scratch-shared/paulm
total 4
drwxr-sr-x 2 paulm paulm 4096 Feb 17 22:16 Blender

So you can use /scratch-local for each process in a job to get a guaranteed unique location for storing/retrieving data that does not interfere with other processes in the same job. In fact, TMPDIR environment variable is set to a default value of /scratch-local/<loginname> and the corresponding directory is already created, or is created when you log in, or a batch job is started.

The /scratch-shared/  directory behaves like scratch space that is shared by all nodes. Please create your own subdirectory under /scratch-shared , e.g. with the command: 

$ mktemp -d -p /scratch-shared

A note on "local" scratch

Note that the /scratch-local/ directories are not truly (physically) local to a node. All /scratch-local/ directories are in fact visible from all nodes (and by all users), if you know the canonicalized fully qualified directory names. This can be seen with:

$ readlink -f $TMPDIR 
/gpfs/scratch1/nodespecific/int1/<loginname>

In fact, all scratch-local and scratch-shared symbolic links are actually pointing to directories that store data on the same underlying GPFS file system, and they share the same single per-user quota regime, as mentioned above.

Node-local system directories

Use of /tmp, /var/tmp, ...

Truly local directories, such as /tmp and /var/tmp, should be regarded as "off limits" for users. They are too small and too slow to be used for job outputs. Furthermore, they are needed by the operating system itself. They can be emptied without further notice at node reboot, at node re-install - in fact at several other occasions.

If you (accidentally) fill up /tmp  or /var/tmp  on a node, the operating system will experience problems. Ultimately your job (and on an interactive node you and other users as well) will experience problems, and our system administrators and/or your fellow users won't like you.

  • Use the scratch file systems instead.
  • In your job command files you can use $TMPDIR. This is a per job step unique directory in /scratch-local  (i.e. therefore also unique per node).
  • On the login node you can also use $TMPDIR


The project file system

The purpose of project space is to enable fast and bulky reading and/or writing of files by large and/or many jobs. A project space is not meant for long-term storage of data. No automatic backup of data on project spaces is provided. In some sense, project spaces can be seen as "user managed scratch". This implies that project users themselves must take care not to run into their quota limit and to backup and recover data when the project expires.

Practically speaking a project file system can be used when:

  1. you need additional storage space, but do not require a backup.
  2. you need to share files within a collaboration.

By default accounts on our systems are not provisioned with a project space. It can be requested when you apply for an account, or by contacting our service desk (depending on the type of account different conditions may apply, contact us to know if your account is eligible for a project space).

No backup for project spaces

Note that SURF provides no backup service on project spaces. If you have not arranged for a backup, and associated restore possibility, your data will be irrevocably lost in case serious damage is caused to your files or to the file system at large (e.g. by failing hardware or human error). SURF provides the archive facility for long-term data storage, but you may of course also use off-site storage of your choice. But it is your own responsibility to archive and to keep track of what you archived when and where.

End date and expiration

The project space itself has an agreed upon end date. But there is no expiration policy for the age of individual files and directories in your project space. Project users themselves must take care not to run into their quota limits, deleting and/or compacting and archiving data no longer needed.

When the agreed upon period of time of your Snellius project space expires, the project space will be made inaccessible. If no further notice from the project space users is received, the files and directories in your project space will eventually be deleted after a grace period of an additional four weeks.

All members of the group used for quota administration will receive a notification on their e-mail address registered in the SURF user administration, 30 days in advance of the expiration date. A second notification mail will be sent out the day after expiration.

In principle the lifetime of a project directory is not extended beyond the lifetime of an associated compute project, as project spaces for projects that cannot be active are wasting high-performance storage resources. In some cases, however, a follow-up project could make efficient use of the same data without first having to stage them from an archive into a new project space. This may be a valid reason for retaining a Snellius project space "in between projects". Demonstrating, before the grace period has ended, that the project proposal for a follow-up project and destined "heir" of the project space has actually been submitted, is mandatory. New limits and expiration dates will have to be established and motivated by the needs of the follow-up project.

Quota on size and number of files

The exact capacity is project dependent. The quota of maximum number of files is derived from the capacity quota: each project is allocated basic quota of 1 million files and on top of that a surplus that is a non-linear function of the capacity quota. The table below contains some reference values for resulting number of files quota, and the resulting average file size. Note that for large project spaces the average file size must be larger than for smaller projects.

Capacity (TiB)Number of filesAvg. file size (MiB)
11,000,0001.05
51,359,8813.86
101,728,1416.07
503,766,21813.92
1005,605,17018.71
2008,492,95224.50
30010,879,24128.91

Project space quota are per group

Quota on project file systems are per group, rather than per user. Users of the project space must be member of the group used for quota administration for the project and they must write files and directories with this group ownership. In most cases this works correctly by default, but some commands that try to set group ownership (e.g. "rsync -a" or "cp -p") will fail without extra options. See the tutorial on using project space for sharing files, for more information.

For users involved in more than one data project it is theoretically possible to store data in multiple project directories using any quota group that they are member of quasi-randomly. This is unwanted behaviour: files and directories with a group ownership used for the quota administration of a particular data project must all be placed under their respective project root directory. Conversely, only subdirectories and files located belonging to the project should be placed under that directory. SURF will enforce these rules, if needed, with periodic corrective actions that change group ownership without further notice.

The archive file system

The Data Archive is not a traditional file system, nor is it specific to Snellius. It is an independent facility for long-term storage of data, which uses a tape storage backend. It is accessible from other systems as well. For more information, see this separate page about the archive

The archive file system Service is intended for long term storage of large amounts of data. Most of this data is (eventually) stored on tape, and therefore accessing it may take a while. For users of the data archive it is accessible from login and staging nodes at the path /archive/<username>.

The archive system is designed to only handle large files efficiently. If you want to archive many smaller files, please compress them first in a single tar file, before copying it to the archive. Never store a large amount of small files on the archive: they may be scattered across different tapes and it will put a large load on the archive to retrieve all those files if you need them at a later stage. See this section of the documentation for more information on using the archive appropriately.

Disk quota

You can check home directory quota, scratch quota, and project space quota using the myquota end-user tooling available on the system. These commands are installed in the directory "/gpfs/admin/hpc/usertools". 
For more information on how to enable these commands and how to use them, please read our Tutorial 'myquota' end-user tooling, Snelllius/GPFS implementation.

Interconnect

All compute nodes on Snellius will use the same interconnect which is based on Infiniband HDR100 (100Gbps), fat tree topology.

Hostkey fingerprints

When you log in to a new system for the first time with the SSH protocol, the system returns a hostkey fingerprint to you:

The authenticity of host 'snellius.surf.nl (145.136.63.187)' can't be established.
ED25519 key fingerprint is SHA256:2Vy9858ldWu3Xjt1a58MbhD5CjLIh1LCb8n/up0izGw.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'snellius.surf.nl' (ED25519) to the list of known hosts.

Before you type "yes" to the question posed to you, you can verify this fingerprint against the list of correct fingerprints for Snellius below, to check that you are indeed logged to the correct system.

ED25519
===
SHA256:2Vy9858ldWu3Xjt1a58MbhD5CjLIh1LCb8n/up0izGw
MD5:22:2d:8c:fa:ca:24:a8:de:6d:08:c2:ad:a2:34:19:61
ECDSA
===
SHA256:BWIyocmUn0wm9gkNhc9CG5MPEQcHFCHxtyPtmkVMbak
MD5:ee:f3:26:54:11:ec:dd:d5:9f:8e:c1:94:fa:99:55:ea
RSA
===
SHA256:saJqHp4Ls1P+23/N/9Jt5kMWGvX8OpqUgZxYUZdV9+s
MD5:21:ac:01:67:67:e4:e8:7b:70:e8:c3:90:d2:02:9f:88


It is also possible to configure your SSH client to retrieve the correct SSH hostkey fingerprints from the SURF DNS automatically, without you having to check these fingerprints manually. In order to enable this, add the following to your ~/.ssh/config:  VerifyHostKeyDNS yes. Or you can use the the following SSH command switch to temporarly enable this: -o VerifyHostKeyDNS=yes

For more information about such a setup, check out this blog post