The Lisa system is a Beowulf cluster computer consisting of several hundreds of multi-core nodes running the Debian Linux operating system. The system is constantly evolving and growing to satisfy the needs of the participants.
At the moment Lisa has the following configuration:
|23||bronze_3104||1.70 GHz||1.5 TB NVME||256 GB UPI 10.4 GT/s||2||8.25 MB||12||4 x GeForce 1080Ti, 11GB GDDR5X||40 Gbit/s ethernet|
|2||bronze_3104||1.70 GHz||1.5 TB NVME||256 GB UPI 10.4 GT/s||2||8.25 MB||12||4 x Titan V, 12GB HBM2||40 Gbit/s ethernet|
|29||gold_5118||2.30 GHz||1.5 TB NVME||192 GB UPI 10.4 GT/s||2||16.5 MB||24||4 x Titan RTX, 24 GB GDDR6||40 Gbit/s ethernet|
|192||gold_6130||2.10 GHz||1.7 TB||96 GB UPI 10.4 GT/s||1||22 MB||16||-||10 Gbit/s ethernet|
|96||silver_4110||2.10 GHz||1.8 TB||96 GB UPI 9.6 GT/s||2||11 MB||16||-||10 Gbit/s ethernet|
|1||gold_6126||2.60 GHz||11 TB||2 TB UPI 10.4 GT/s||4||19.25 MB||48||-||40 Gbit/s ethernet|
|6||gold_6230R||2.10 GHz||3 TB||376 GB UPI 10.4 GT/s||2||35.75 MB||52||-||2 x 25Gbit/s ethernet|
There are several file systems accessible on Lisa.
|File system||Quota||Speed||Shared between nodes||Mountpoint||Expiration||Backup|
|Home||200 GB||Normal||Yes||/home/<username>||15 weeks after project expiration||Nightly incremental|
|Scratch-local||1.5 <> 1.7 TB||Fast||No||/scratch||End of job||No|
|Scratch-shared||N.A. (size 3TB)||Normal||Yes||/nfs/scratch||At most 14 days||No|
|Project||based on request||Normal||Yes|
|Archive Service||N.A.||Very slow||Only available on login nodes||/archive/<username>||Project duration||Nightly|
The Home file system
When you login on Lisa, by default, you're on the home file system. This is the regular file system where you can store your job scripts, datasets, etc. You can always access the home file system through the
$HOME environment variable, e.g. using
ls -als $HOME you can list all the files and folder in your home folder.
The home file system contains the files you normally use. By default, your account is provisioned with a home folder of 200 GB. Your current usage is shown when you login to Lisa, or you can see it using
The home file system is a network file system that is available on all login and batch nodes. Thus, your jobs can access the home file system from all nodes. The downside is that the home file system is not particularly fast, especially with the handling of metadata: creating and destroying of files; opening and closing of files; many small updates to files and so on.
Backup & restore
- We do nightly incremental backups.
- Files that are open at the time of backup will be skipped.
- We can restore files and/or directories when you accidentally remove them up to 15 days back, provided they already existed during the last successful backup.
The scratch file system
The scratch file system is intended as fast, temporary storage that can be used while running a job, and can be accessed by all users with a valid account on the system. Every compute node in the Lisa system contains a local disk for the scratch file system that can only be accessed by that particular node. There is no quota for the scratch file system; use of the scratch file system is eventually limited by the capacity of these disks (see table above). Scratch disks are not backed up and are cleaned at the end of a job.
Since the disks are local, read and write operations on to the scratch file system are much faster than on the home file system. This makes the scratch file system very suitable for I/O intensive operations.
The scratch disks in the GPU partition are NVME SSDs, which are particularly fast and suitable for intensive I/O, making these very suitable for machine learning training sets.
You access the scratch file system by using the environment variable $TMPDIR: this points to an existing directory on the local disk of each node. For example: to create a directory 'work' on the scratch file system and copy a file from the home file system to that directory:
mkdir "$TMPDIR"/work cp my-file "$TMPDIR"/work
Note the use of
"$TMPDIR" (with quotes) rather than
$TMPDIR (without quotes). The reason is that
$TMPDIR can contain meta-characters (e.g. [ and ]); the quotes take care that the shell will leave those characters as-is.
In addition to temporary storage that is local to each node (like scratch), you may need some temporary storage that is shared among nodes. For this we have a shared scratch disk accessible through
The size of this shared scratch space is currently 1 TB and there is no quota for individual users. Note that this shared scratch has two disadvantages compared to the local scratch disk
- The speed of
/nfs/scratchis similar to the home file system and thus slower than the local scratch disk at "$TMPDIR".
- You share
/nfs/scratchwith all other users and there may not be enough space to write all the files you want. Thus, carefully think how your job will behave if it tries to write to
/nfs/scratchand there is insufficient space: it would be a waste of budget if the results of a long computation are lost because of it.
How to best use scratch
In general, the best way to use scratch is to copy your input files from your home to scratch at the start of a job, create all temporary files needed by your job on scratch (assuming they don't need to be shared with other nodes) and copy all output files at the end of a job back to the home file system. There are two things to note:
- Don't forget to copy your results back to the home file system! Scratch will be cleaned after your job finishes and your results will be lost if you forget this step.
- If you created files with the same filename on the scratch disk of different nodes, copying them back will result in a clash: you're trying to write two different files to the same filename on the home file system. Avoid this by including something unique to the host (e.g. the hostname, which you can retrieve with the
hostnamecommand) in the file or directory name.
The project file system
A project file system can be used
- If you need additional storage space, but do not require a backup.
- If you need to share files within a collaboration.
By default accounts on our systems are not provisioned with a project space. It can be requested when you apply for an account, or by contacting our service desk (depending on the type of account different conditions may apply, contact us to know if your account is eligible for a project space).
A project space can be accessed at the location
/project/<project_space_name>. Project quota are implemented as group quota, not as user quota and you can check how much free space you have using
df -h /project/<project_space_name>.
To share files on the project file system, you need to make sure to write files with the correct file permissions. See the corresponding section in the documentation on how to do that.
Expiration of project space
Project spaces duration coincide with the allocation within which has been granted. When the agreed upon period of time for the project space expires, the project space will be made inaccessible. If no further notice from the project space users is received, we are entitled to delete all the files and directories in your project space after a grace period of an additional four weeks.
Backup & restore
We do not make backups of project spaces, and thus cannot restore data. Users are responsible for making their own backups, if needed.
The archive file system
The Data Archive is not a traditional file system, nor is it specific to Lisa. It is an independent facility for long-term storage of data, which uses a tape storage backend. It is accessible from other systems as well. For more information, see this separate page about the archive
The archive file system Service is intended for long term storage of large amounts of data. Most of this data is (eventually) stored on tape, and therefore accessing it may take a while. If you are a user of the data archive, it is is accessible at
/archive/<username>. The archive system is designed to only handle large files efficiently. If you want to archive many smaller files, please compress them first in a single tar file, before copying it to the archive. Never store a large amount of small files on the archive: they may be scattered across different tapes and it will put a large load on the archive to retrieve all those files if you need them at a later stage. See this section of the documentation for more information on using the archive appropriately.
Lisa Hostkey Fingerprints
When you log in to a new system for the first time with the SSH protocol, the system returns a hostkey fingerprint to you:
The authenticity of host 'lisa.surfsara.nl (184.108.40.206)' can't be established. ED25519 key fingerprint is SHA256:UhVbfNE+O1oEjdLidcM9YS0hLHrO3tQYrVIo4BAqwNo. ED25519 key fingerprint is MD5:a0:d5:6e:e6:41:41:8d:06:68:5a:1d:aa:03:7f:40:3b. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'lisa.surfsara.nl' (ED25519) to the list of known hosts.
Before you type "yes" to the question posed to you, you can verify this fingerprint against the list of correct fingerprints below to check that you are indeed logged to the correct system.
ED25519 === SHA256:UhVbfNE+O1oEjdLidcM9YS0hLHrO3tQYrVIo4BAqwNo MD5:a0:d5:6e:e6:41:41:8d:06:68:5a:1d:aa:03:7f:40:3b
RSA === SHA256:8wVrNrBzU399UFktk3sNHvp6x2cjbhJBai5MRe10w8E MD5:b0:69:85:a5:21:d6:43:40:bc:6c:da:e3:a2:cc:b5:8b
Installation and maintenance of the system
Most of the software that is used to manage the Lisa system is Open Source. We are using the following software to manage the system:
- CFEngine 3, A configuration engine
- As batch software we use Slurm.
- For monitoring we use Prometheus.
SALI(Sara Automatic Linux Installer) is a tool that allows you to install Linux on multiple machines at once. It support several protocols for downloading to install a machine. For example, BitTorrent and rsync are supported. SALI originates from SystemImager and still uses the same philosophy. It is a scalable method for performing unattended installation. SALI is mostly used in cluster setups.