The SURF Data Archive provides efficient archiving of large datasets by storing files on tape. Data can be ingested either via scp based clients or via high-performance tools.

File tranfers applications

Data can be transferred to the Data Archive from other infrastructures (Lisa and Snellius) and from online storage systems using for example Cyberduck (MacOS) or FileZilla and WinSCP (Windows).

To connect to the Data Archive use the SFTP protocol (server name: archive.surfsara.nl) with using your CUA credentials. Open another window to login to your other online storage service. Once the two connections are established in the two windows you can simply drag and drop data between Data Archive and the other service. Remember not to transfer many small files to the archive and data that is not staged might slow down transfers significantly.

High-performance transfer protocols

Due to the size of the data files, data transfers from and to the service can take long using normal transfer protocols like SSH or SFTP. To obtain high-speed transfers over your network to the Data Archive, special protocols can be employed that enable transfers that fully utilise the network bandwidth of your connection.

Very important

  • Using the HPN-SSH protocol with the none cipher no longer encrypts your data transfers, please be aware of that
  • On the contrary, authentication is still done using encrypted connections
  • Files on the Data Archive can be offline, therefore need to be staged first before you can download a file from the archive. To learn how to do this please refer to the DMF documentation.

Things to note

  • Using the high-performance protocols requires the installation and configuration of some tools
  • All commands are issued on the HPC system after logging in (see SSH Usage)
  • Please refer to the following guides for other usages of the Data Archive:
  • The use of the GridFTP transfer protocol requires the administrators of the Data Archive to enable it for your specific user

HPN-SSH protocol

The High-Performance SSH/SCP (or HPN-SSH) protocol is an optimized version of the commonly used SSH protocol enabling high-performance data transfers using the full network bandwidth of your internet connection. Technically speaking, normal SSH transfers use a small fixed packet window size and slow data encryption ciphers that make data transfers inefficient and CPU-hungry. For more in-depth information refer to the Pittsburg Supercomputer Center documentation page.

SURFsara supports the use of the HPN-SSH protocol from and to the Data Archive using a specially adapted version of the OpenSSH tools. These tools need to be installed on your client machine from where you want to make your data transfers from.

Installation

To install the OpenSSH tools with high-performance optimizations on your client machine follow the guides below depending on the operating system you are using. Effectively the guides make you install an additional OpenSSH installation next to the one provided by your OS, if provided at all. The optimized version works exactly the same as the normal version but supports additional options that enabled high-performance transfers.

Linux

Installation on Linux requires the download of the OpenSSH code and patching it with the HPN-SSH modifications. You might need to install a compiler like gcc. Use your package manager for this.

  • Download the latest supported portable OpenSSH (here version 8.4p1) code:
wget https://ftp.nluug.nl/pub/OpenBSD/OpenSSH/portable/openssh-8.4p1.tar.gz
  • Untar it:
tar -xzf openssh-8.4p1.tar.gz
  • Download the latest HPN-SSH patch (here version 8.4p1):
wget https://iweb.dl.sourceforge.net/project/hpnssh/Patches/HPN-SSH%2015v1%208.4p1/openssh-8_4_P1-hpn-15.1.diff
  • Enter the OpenSSH folder and patch the OpenSSH code files:
patch -p1 < ../openssh-8_4_P1-hpn-15.1.diff
  • Configure your target directory where the tools will be installed, e.g.:
./configure --prefix=/opt/hpn-ssh
  • If configure fails to find the OpenSSL header files, please refer to it using an additional option, e.g.:
./configure --prefix=/opt/hpn-ssh --with-ssl-dir=/usr/local/opt/openssl
  • Make and install the tools:
make && make install

Use sudo when needed. Your HPN-enabled OpenSSH tools are now installed.

MacOS

MacOS installation can be done following the same procedure as for Linux installation. You can use any package manager of your choice. For inexperienced users we recommend the to follow the step by step guide of the Linux installation: In case you don’t have the wget  program, you can install it using:

brew install wget openssl

Then continue following the step by step instruction for Linux.

Windows

High-performance transfers are possible using putty or WinSCP file transfer tools. Please refer to their documentation on how to install these tools.

Transfer your data

Data transfers can now be made using your terminal and specific command.

Please note:

  • Files on the Data Archive can be offline, therefore need to be staged first before you can download a file from the archive to your machine
  • The HPN-SSH protocol is enabled on Data Archive through port 2222
  • You need to provide some options with the command that enable the efficient transfers not provided in the normal version op OpenSSH:
OptionDescriptionValue
NoneEnabledEnables increased window sizesyes
NoneSwitchEnables NONE cipheryes

Linux

To transfer data using the HPN-SSH protocol to the Data Archive service on Linux do the following:

/opt/hpn-ssh/bin/scp -P 2222 -oNoneEnabled=yes -oNoneSwitch=yes <yourfile> <username>@archive.surfsara.nl:<target name>

Reverse the last two arguments to transfer data from the Data Archive to your machine. Make sure to stage a file first before you attempt to download it (see DMF documentation). The transfer output will warn you that the cipher is set to 'NONE'. This is intended behaviour:

WARNING: ENABLED NONE CIPHER

MacOS

To transfer data using the HPN-SSH protocol to the Data Archive service on MacOS do the following:

/usr/local/Cellar/hpn-ssh/7.5p1/bin/scp -P 2222 -oNoneEnabled=yes -oNoneSwitch=yes <yourfile> <username>@archive.surfsara.nl:<target name>

Reverse the last two arguments to transfer data from the Data Archive to your machine. Make sure to stage a file first before you attempt to download it (see DMF documentation). The transfer output will warn you that the cipher is set to 'NONE'. This is intended behaviour:

WARNING: ENABLED NONE CIPHER

Tips and tricks

Alias

To make using the installed tools a little bit easier, you can add an alias to your MacOS or Linux shell that will automatically use the HPN-SSH version of the scp tool together with the necessary options. Furthermore, you can add your identity file (-i option) so you don't have to enter your password every time. For Linux enter the following command:

alias hpnscp='/opt/hpn-ssh/bin/scp -P 2222 -i <identityfile> -oNoneEnabled=yes -oNoneSwitch=yes'

For MacOS enter:

alias hpnscp='/opt/hpn-ssh/bin/scp -P 2222 -i <identityfile> -oNoneEnabled=yes -oNoneSwitch=yes'

Similarly to using the normal scp tool, you can now simply enter:

hpnscp <yourfile> <username>@archive.surfsara.nl:<target name>

to transfer your files.

The identity file, user name and/or port number can also be configured in the SSH configuration file, see for example this tutorial.

Auto alias

To avoid having to define this alias for every terminal session, you can add it to a file ~/.bash_aliases. This file will automatically be loaded by the ~/.bashrc file upon opening your terminal, depending on your operating system. If not, make sure to add the following lines to ~/.bashrc:

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

Changes will take effect after you restart your terminal session or run the file yourself.

GridFTP

GridFTP is supported by the archive. To transfer data to the archive by the gridFTP protocol you need to have access to the globus-url-copy command or uberftp. These clients are available on the grid user interface machine (ui.grid.surfsara.nl) and Snellius.

For information on how to obtain a (DigiCert) certificate, please refer to the Grid documentation on certificates.

Transfer data using Globus client

The globus client is installed on our Archive service and you can use the globus-url-copy tool for transferring data from the Archive. Globus-url-copy supports gsiftp: (GridFTP), ftp://, http://, https://, and file:/ protocol specifiers in the URL. For more technical guidance in using the globus-url-copy tool, please visit the globus client documentation page.

Connection of Grid Certificate to Archive usage 

To connect your grid certificate with the user account on the archive server you must contact the helpdesk of SURFsara and send them your distinguished name (DN) from your grid certificate If you followed the grid installation guide your certificate should be located in the {{cd $HOME/.globus}} folder in your UI account. To access the needed information login to the UI

ssh <username>@ui.grid.sara.nl

From UI initialize the proxy with:

voms-proxy-init --voms pvier

You will need to authenticate with the password you used to export the certificate. If this was successful you should be able to read out the DN information with the following command:

voms-proxy-info

Among much other information, you will find the line subject which will have a format similar to this

Issuer: /DC=org/DC=terena/DC=tcs/C=NL/O=<YourOrg> B.V./CN=<YourName> Your@mail.nl

Once you have this information send a mail to our helpdesk with this line and refer to GridFTP usage. Your connection will be set up. You will receive a mail when this is done. 

Usage of GridFTP

The Data Archive GridFTP server is: archive2.surfsara.nl Once you have the connection set, you can use the {{globus-url-copy}} command to list or transfer files to the Archive, i.e: List files in a directory: The Data Archive GridFTP server is: archive2.surfsara.nl Once you have the connection set up, you can use the globus-url-copy  command to list or transfer files to the Archive, i.e: List files in a directory:

globus-url-copy -list gsiftp://archive2.surfsara.nl/home/<username>/

Copy a data file from grid ui to archive:

globus-url-copy file:///home/<username>/<filename.tar> gsiftp://archive2.surfsara.nl/archive/<username>/<filename>

Use the -help function for more details

globus-url-copy -help

Please also read our guidelines for storing data and do not store small files.