Trouble with connecting to Snellius

There are a few common issues that may be the cause of failing to connect to snellius.

Usage Agreement

In order to make use of the Snellius service, you need to read and accept the Usage Agreement.

For this you need to visit https://portal.cua.surf.nl/ and login with your login and password.

The system is down:

Please check the system status page.

You are temporarily banned:

You will receive a 24hrs ban on a login node after 5 failed login attempts.
The interactive nodes are protected by fail2ban; since we have multiple login nodes it can happen that you are banned at one login host but not at the other login host.

We encourage you to use SSH public keys setup to access Snellius. See this information on how to upload your public key to Snellius.

Attempting to connect from a non-whitelisted IP

The interactive nodes only accept (GSI-)ssh connections from known, white-listed IP, ranges. You may be trying to connect with an IP by using an IP address that is not in a white-listed range.

So, you might find that the system cannot be accessed while traveling. For these moments, please use the doornode.

If you need access from a location that you will use regularly and long-term, please contact us through the service desk with your external IP address. Please take care that you report the CORRECT public IP address. As many sites nowadays use private IP space and a network address translation scheme, your public IP address is NOT necessarily an address that is configured directly on your local system and hence not necessarily known to your system. The following ranges are by definition private IP address ranges that cannot be whitelisted:

  • 192.168.0.0 - 192.168.255.255
  • 172.16.0.0 - 172.31.255.255
  • 10.0.0.0 - 10.255.255.255

You can easily find out the public IP address that you use, by visiting https://echoip.cua.surf.nl with your web browser.

Using the doornode

At times you'll find yourself on the road and get this good idea, which you would like to test with a simulation on Snellius. When you try to login, you'll find that access is often not possible, which can be quite frustrating. The problem is that Snellius uses a white-list of ip-addresses and only from those locations you can access the system. To help you in these situations, we have setup a separate login server, that can be accessed from anywhere in the world: doornode.hpcv.surf.nl (thus using `ssh user@doornode.hpcv.surf.nl`). This server can be accessed with your usual login and password, after which you get a menu with systems that you can login to. Select 'Snellius' and type your password a second time. You are now logged on to Snellius. Please note that you cannot copy files or use X11 when using the door node.

The doornode has the following hostkey fingerprints:

  • ed25519
    • MD5:12:e3:c8:9b:82:4f:7d:59:79:c2:47:fa:4b:46:4a:5b
    • SHA256:eEtqOPM6HP4MagLwVFVwfbeFBmepj4oL84DUFkfhnPE
  • ecdsa
    • MD5:e3:6f:87:44:15:4e:76:b9:e1:fc:3c:50:99:ee:ed:06
    • SHA256:/FrDkx3GNb1i3Bb677V06NiS580pMTb2RT0nzotcpjc
  • rsa
    • MD5:a3:55:8a:4f:cd:70:40:89:71:fc:99:6e:39:97:65:5d
    • SHA256:VgLSSrcKTJ30kuuIcrYSf0W01KZ69PTz2LJZ+Za5iHU

How to disconnect

Simply issue the command

logout

or

exit

in the terminal window. Do not forget the 'Enter' after this command.

More information

More information about using Linux systems in general can be found on the web, for example:

  • The UNIX Tutorial for Beginners contains a useful into Unix. NOTE: some examples (especially those about variables) are for another shell (csh) then the default shell on Snellius (bash).
  • The Advanced Bash-Scripting Guide gives an in-depth but readable overview of the usage of the standard login shell 'bash', with examples.


Data management policy

Expired home directories and project spaces will be deleted

The SURFsara Usage Agreement states that data will be removed within 6 months after the expiration date of an agreement (Contract, Project Agreement, NWO (EInfra)grant, etc.). If a login (and its home directory) has no association for longer than 15 weeks with an active account/budget on the basis of which access to our systems is granted, we will delete the login and its home directory.

Data access granted to others

In some cases, owners of home directories have granted access to their data to others via group memberships or "access control lists" (ACL's). If the others still need these data, they need to take action to preserve the data for themselves.

Project spaces

For project spaces, by and large, the same applies as for home directories. Differences with home directories have to do with the fact that project spaces, unlike home directories, are created as collectively owned by the logins that are members of a disk quota group. Project space allocation is an integral part of NWO grants. When the NWO grant (or other contractual basis) expires, and there is no new or prolonged grant or contract within 15 weeks, the project space is expired as well and will be cleaned up. If there is a new grant or prolongation arrangement, the project space will remain. However, the logins that are no longer associated with the new account, will be removed from the quota group. Files in the project space associated with the UID of an expired login should be assigned to another group member that is still active.

Principal investigators of an account are warned 90, 60 and 30 days before their account expires, so there is enough time to take the appropriate measures before expiration date.



Questions about running jobs

I expect output from my program, but no output is generated

The output of any program is buffered before it is written to e.g. stdout;
You can disable buffering of output in most languages.

Please re-enable buffering after you are done debugging, as unbuffered output will negatively influence performance of your program.

Python

You can enable unbuffered output with python's built-in -u  flag:

python -u <main.py>

or by setting the environment variable PYTHONUNBUFFERED :

PYTHONUNBUFFERED=1 python <main.py>

Fortran 

If your program is compiled with the GNU gfortran compiler, set the following environment variable:

export GFORTRAN_UNBUFFERED_ALL=y

C

For C programs, the buffering can be changed using the command setvbuf. E.g. standard output can be unbuffering using:

#include <stdio.h>
...
setvbuf(stdout, NULL, _IONBF, 0);

My job doesn't start with a status 'ReqNodeNotAvail'

This usually happens when a maintenance session is planned. You can see planned maintenance on the system status page or in the message of the day when logging in on Snellius. Jobs with a maximum wall clock time longer than the time until the start of the maintenance, will not start until after the maintenance and are indicated with a status 'ReqNodeNotAvail' in the squeue output. A workaround is to use a shorter maximum wall clock time.

I can't use CVS.

Snellius does not support the default remote shell 'rsh' for security reasons. Please use:

export CVS_RSH=ssh


How can I determine the memory usage of my application?

The SLURM batch scheduler logs the memory usage of your application and it can be retrieved after your job has ended. By issuing the command

job-statistics -j <JOB_ID>

will show the average and maximum memory use per MPI task, which MPI task used the maximum memory and on what node.

Example usage might look like this

$ job-statistics -j 1155623
...
              AveRSS :  11077K
              MaxRSS :  11576K
          MaxRSSTask :  46
          MaxRSSNode :  tcn828
...

If you want to print the memory usage as part of your application, note that the linux system call getrusage() isn't fully implemented under linux, see 'man 2 getrusage'. Please compile and use the C routine printmem() (listed below), which prints the memory usage.

C routine printmem()
#include <stdio.h>
int printmem()
{
char buf[30];
        snprintf(buf, 30, "/proc/%u/statm", (unsigned)getpid());
        FILE* pf = fopen(buf, "r");
        if (pf) {
            unsigned size; //       total program size
            unsigned resident;//   resident set size
            unsigned share;//      shared pages
            unsigned text;//       text (code)
            unsigned lib;//        library
            unsigned data;//       data/stack
            unsigned dt;//         dirty pages (unused in Linux 2.6)
            fscanf(pf, "%u" /* %u %u %u %u %u"*/, &size/*, &resident, &share, &text, &lib, &data*/);
            printf("KB used: %u\n",size);
            fclose(pf);
            return((int)size);
        }
}


Which nodes are allocated to my job?

The environment variable $SLURM_NODELIST contains the names of the nodes. The format is something like: tcn[9006-9008]. Using the program scontrol you can obtain the nodenames, one name per line:

$ scontrol show hostnames
tcn9006
tcn9007
tcn9008

Using the command nodeset, you get all node names from the $SLURM_NODELIST variable on one line:

$ nodeset -e $SLURM_NODELIST
tcn9006 tcn9007 tcn9008

What does maintenance mean?

A few times per year, you will see in the 'message of the day' (the message you get when you login in), that maintenance is planned. During this period the system will be upgraded or adapted.

Consequences for you:

  • During maintenance, you cannot log in
  • Jobs, that would still be running at the start of the maintenance, will not be started


Miscellaneous

Can I receive mail on my login?

No, you can't receive messages from outside the system. The batch nodes can send mail to your login, but, in order to read them, you have to forward (using the $HOME/.forward file) mail sent to your login.

--mail-user=me@home.nl

Put the following line in your job:

echo "Job $SLURM_JOBID started at `date`" | mail $USER -s "Job $SLURM_JOBID"

and edit the file $HOME/.forward, example:

me@home.nl

What information should be present in NWO Small Requests (Pilot grants) for Snellius?

We expect certain information to be present in the NWO small grant applications. Putting this information there already helps us to evaluate grants quickly and efficiently which in turn results in faster process times for the applicant and also lesser questions asked. Please refer to this page for tips on what details do we require in the application form and also refer to the examples present on that page.

Acknowledge SURF for the usage of Snellius and provided support I got

We would appreciate if you put a text like this in your publications about projects wherein Snellius played a role:

We thank SURF (www.surf.nl) for the support in using the National Supercomputer Snellius.