Synopsis

This documentation is for researchers on Snellius who would like to use Jupyter Notebooks for their research. If you are a teacher using our "Jupyter for Education" service for a course, please refer the documentation for that service.


Running Jupyter Notebooks using Snellius' JupyterHub

You can start your own Jupyter Notebook through the instructions in Running Jupyter Notebooks on a batch system using your own batch script. However, on Snellius, we run a JupyterHub, which can start a Jupyter Notebook Server for you and will connect you to it. This is generally much easier.

Starting a Jupyter Server using Snellius' JupyterHub

To start a Jupyter Notebook on Snellius using the JupyterHub:

  1. Go to the web address of the JupyterHub (e.g. https://jupyter.snellius.surf.nl/2022 if you want to run based on the '2022' module environment. The JupyterHub running with the previous environment, https://jupyter.snellius.surf.nl/2021 will also remain available for the foreseable future).
  2. Login using your regular Snellius credentials.
  3. Select from the dropdown menu if you want to start a Jupyter Server on a CPU or GPU node and click 'Start'. Note that Jupyter Server jobs started through the hub will be accounted like any other normal job, and that GPU nodes are more expensive.

For the 2022 JupyterHub, the Jupyter Server can run two applications: either the classic Jupyter Notebook, or the JupyterLab environment. For the 2021 JupyterHub, only the classic Jupyter Notebook environment is supported. The 2022 JupyterHub by default redirects you to the JupyterLab environment, but you can change this (see instructions below). 

Note that we keep some limited, but dedicated resources free to run these Jupyter Servers so that they start quickly. In general, this means that the Juypter Server will either start 'immediatly' (with 1 minute) or the job will time out (if the dedicated resources are filled up already by other users). If the job times out, please try again later when fewer users are using the system. It also means that the runtime of the Jupyter jobs is limited to 1 hours.

Stopping a Jupyter Server from the JupyterHub

By default, the jobs launched by the JupyterHub will remaing running until they hit their maximum walltime, even if you close your browser. The advantage is that you can reopen your browser window, and continue your running session. The downside is that you'll keep paying SBUs for the full duration of the Jupyter job, unless you explicitely stop the Jupyter Server job.

The way to do this is slightly different between the classic Jupyter Notebook interface, and the Jupyter Lab interface:

  • In the classic Jupyter Notebook interface, click the 'Control Panel' button in the top right. Then, click 'Stop My Server'. This cancels the corresponding batch job.
  • In the Jupyter Lab interface, click "File", then "Hub Control Panel". That opens a new browser window, where you can click "Stop My Server" to stop the Jupyter Server. This cancels the corresponding batch job.

Switching between the JupyterLab and classic Jupyter Notebook interface

The '2021' JupyterHub only offers the classic Jupyter Notebook interface. The '2022' JupyterHub however will start a JupyterLab environment by default (at https://jupyter.snellius.surf.nl/2022/user/<username>/lab), to which you get forwarded automitcally once the job starts. You can switch to the classic notebook interface in two ways:

  1. After the JupyterLab has started, you can change the address to https://jupyter.snellius.surf.nl/2022/user/<username>/tree in your browser
  2. After the JupyterLab has started, you can go to "Help" => Launch Classic Notebook

If you want to change the default  environment to which you get redirected to the classic Jupyter Notebook interface, you can add the following line to your .bashrc file in your home directory on Snellius:

export JUPYTERHUB_SINGLEUSER_APP='notebook.notebookapp.NotebookApp'

The software environment on Snellius' JupyterHub

The software environment in which Snellius' JupyterHub starts contains a predetermined list of modules that is loaded. As an example: https://jupyter.snellius.surf.nl/2021 will load R and Python modules from the '2021' module environment on Snellius, while https://jupyter.snellius.surf.nl/2022 will load R and Python modules from the '2022' module environment. To see exactly which modules are loaded:

  1. Start a Jupyter Notebook session
  2.  Click 'new' = > 'terminal'
  3. Run the 'module list' command.

Note that Python and R packages that you installed locally (i.e. using '--user' in case of pip) will be available, provided they were installed with the same versions of Python that are loaded in the JupyterHub environment.

Customising the module environment on Snellius JupyterHub

Sometimes, you may need more control over the software environment. For example, you want more modules to be loaded.

The module environment in Snellius' JupyterHub can be customised by adding a file in your home directory that lists the required modules to be loaded. For example, for the 2021 environment, you can create the directory '.jupyter-2021' directly in your home directory. In that directory, you can put a file called 'env' which will be sourced right before the Jupyter Notebook starts. You should at least put the modules in there that provide the Jupyter commands and python/R kernels:

# Example /home/$USER/.jupyter-2021/env file
module load 2021
module load JupyterHub/1.4.1-GCCcore-10.3.0
module load cuDNN/8.2.1.32-CUDA-11.3.1
module load IRkernel/1.2-foss-2021a
module load jupyter-server-proxy/3.2.1-GCCcore-10.3.0
module load jupyterlmod/2.0.2-GCCcore-10.3.0
module load SciPy-bundle/2021.05-foss-2021a
module load matplotlib/3.4.2-foss-2021a
module load jupyter-resource-usage/0.6.0-GCCcore-10.3.0
module load <whatever else you need>

# Set some environment variable that we want to use in a Jupyter Notebook
export my_env_var=1

Note that the additional modules you load need to be compatible with that of the jupyterhub and IRkernel. The easiest way to check this is to load the modules interactively and verify that you don't see any errors.

In addition to the customizing your environment through this 'env'  file, you can also add a 'script' file in the same directory. Commands in this file will be evaluated in a sub-shell right before the Jupyter Notebook environment is started. It could for example be used to copy some input to the scratch directory if you need to work on a file from the Jupyter Notebook and need to do a lot of I/O operations. Note that since this is evaluated in a sub-shell, it won't change the environment in which the Jupyter Notebook Server runs.

# Example /home/$USER/.jupyter-2021/script file

# I always want to run this command right before starting my Jupyter Notebook Server
cp -r ~/my_input_file $TMPDIR

# Note that setting environment variables like this won't change the environment of 
# the Jupyter Notebook Server since the script-file is executed in a sub-shell.
# Thus, there is typically no reason to do this
export my_var=4

Using Python virtual environments with Snellius' JupyterHub

If you want to use a Python virtual environment in a Jupyter Notebook, you'll have to install a custom kernel that represents this virtual environment. E.g. to install a virtual environment with 'virtualenv' and make it available as a custom kernel, you'll need to:

  1. Load the Python module that is also available in the standard JupyterHub environment (i.e. follow the steps in "The software environment on Snellius' JupyterHub" to check which module that is)
  2. Create a new virtual environment
  3. Purge the modules, to avoid python packages being picked up from the module environment
  4. Activate your new virtual environment
  5. Install ipykernel in that virtual environment
  6. Install any other packages you requires in the virtual environment
  7. Install the virtual environment as a custom kernel
  8. Make sure that packages from outside the virtual environment are not picked up by adding '-E' to the kernel launch command

As an example, to install a custom kernel based on a virtual environment in the https://jupyter.snellius.surf.nl/2021 JupyterHub:

# Load Python module used by the https://jupyter.snellius.surf.nl/2021 JupyterHub
module load 2021
module load JupyterHub/1.4.1-GCCcore-10.3.0

# Create and activate virtual environment
virtualenv my_env

# Purge modules so that any subsequent pip-installs don't pick up on python packages from the module environment
module purge

# Activate virtual environment
source my_env/bin/activate
 
# Install ipykernel in the virtual environment
pip install ipykernel
 
# Install any other packages needed in the virtual environment
pip install <whatever else you want>
 
# Install the virtual environment as custom kernel. It will show up in the Jupyter Notebook Server with the name passed to the '--name' argument.
python -m ipykernel install --user --name=my_env

# Makes sure the kernel only uses Python packages from the conda environment, not from the module environment
sed -i '/"-m",/i \ \ "-E",' ~/.local/share/jupyter/kernels/my_env/kernel.json


Note that if you use the '--systems-site-packages' flag when creating the 'virtualenv', you'll need to pass the '--force' flag to when you install 'ipykernel' in order to force ipykernel to be reinstalled in your virtual environment. Also, in that case, you sould skip the last 'sed' command listed in the example.

Using Conda virtual environments with Snellius JupyterHub

If you want to use a Conda virtual environment in a Jupyter Notebook, you'll have to install a custom kernel that represents this virtual environment. E.g. to install a virtual environment with 'virtualenv' and make it available as a custom kernel, you'll need to:

  1. Load the Python module that is also available in the standard JupyterHub environment (i.e. follow the steps in 'The software environment on Snellius' JupyterHub' to check which module that is)
  2. Load a module that provides the conda command (e.g. one of the Miniconda3 or Anaconda3 modules)
  3. Initialize the conda shell (if you haven't already done that earlier)
  4. Create a new virtual environment
  5. Purge the modules, to avoid python packages being picked up from the module environment
  6. Activate your new virtual environment
  7. Install ipykernel in that virtual environment
  8. Install any other packages you requires in the virtual environment
  9. Install the virtual environment as a custom kernel
  10. Make sure that packages from outside the virtual environment are not picked up by adding '-E' to the kernel launch command

As an example, to install a custom kernel based on a conda virtual environment in the https://jupyter.snellius.surf.nl/2021 JupyterHub:

# Load Python module used by the https://jupyter.snellius.surf.nl/2021 JupyterHub
module load 2021
module load JupyterHub/1.4.1-GCCcore-10.3.0
module load Miniconda3/4.9.2 

# Initialize conda shell
conda init bash

# Create new conda virtual environment
conda create --name my_conda_env -y

# Purge modules so that any subsequent installations don't pick up on python packages from the module environment
module purge

# Activate virtual environment
conda activate my_conda_env
 
# Install ipykernel in the virtual environment
conda install ipykernel -y
 
# Install any other packages needed in the virtual environment
conda install <whatever else you want>
 
# Install the virtual environment as custom kernel. It will show up in the Jupyter Notebook Server with the name passed to the '--name' argument.
python -m ipykernel install --user --name=my_conda_env

# Makes sure the kernel only uses Python packages from the conda environment, not from the module environment
sed -i '/"-m",/i \ \ "-E",' ~/.local/share/jupyter/kernels/my_conda_env/kernel.json

Running Jupyter Notebooks on Snellius using your own batch script

The method described here allows you to run Jupyter Notebooks on  Snellius (and in fact: on any batch system) using your own batch script. This gives you full control over the environment in which the Jupyter Notebook Server runs, but requires more technical expertise than simply using the JupyterHub.

Prerequisite

You need create a key-pair on the cluster (see these instructions) in order for step 2 (below) to work. This key-pair will allow passwordless creation of SSH connections between the login nodes and the batch nodes.

In order to run Jupyter Notebooks, we need to do two things:

  1. Starting a Jupyter Notebook Server
  2. Connecting to that Jupyter Notebook Server

If you run a Jupyter Notebook Server on your local machine, these steps are quite trivial: you start a Jupyter Notebook Server and browse in your web browser to e.g. http://127.0.0.1:8888 or http://localhost:8888 in order to connect to it. On a batch system, this is a bit more complicated.

First, starting the Jupyter Notebook Server will need to be done in a batch script. Second, connecting to that Jupyter Notebook server is not trivial, since the batch nodes are not directly accessible from the internet. Thus, we'll need to tell SSH to make a so called 'proxy-jump' via the login nodes in order to connect to the batch node. 

  1. Launch a job that starts a Jupyter Notebook Server.
  2. Forward the port for the Jupyter Notebook Server to your local machine, via the login node.
  3. Connect to the Jupyter notebook server

Step 1: batch job for launching a Jupyter Notebook Server

Before we start, you'll need to have passwordless login set up between nodes in the batch system. You can easily test if this is the case: start a small job, and try to ssh to the node on which the job starts running. If, upon ssh-ing, you get asked for your password, you don't have passwordless login set up between your the nodes. Please follow these instructions for setting up passwordless login on Snellius.

Like in any other batch job, we'll first need to specify the requirements for our job. E.g. if we want to start a Jupyter Notebook Server that runs for 1 hour on a node in the 'thin' partition, our job would start with:

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -p thin
#SBATCH -o jupyter-notebook-job.out

If you have particular requirements for which node should run the Jupyter Notebook Server (e.g. you want to run on a GPU node), you can specify those requirements as you would for any other batch job (e.g. specify another partition using -p or specify additional constraints using --constraints).

Next, you'll need to make sure the 'jupyter notebook' command is available, either by loading the appropriate modules, e.g.

module load 2022
module load IPython/8.5.0-GCCcore-11.3.0
module load JupyterHub/3.0.0-GCCcore-11.3.0

or by installing your own version of Jupyter (e.g. in a virtual/conda environment). If you did the latter, you'll have to make sure to activate the virtual environment in the job script.

Next, we'll select a port to run the Jupyter Notebook Server on. We'll select a random port between 5000-5999 to reduce the chance that your choice of port clashes with that of another user on the system. Since we'll need to know this port later on in order to establish the connection, we print it to the standard output of the job. We also specify the hostname of the login host (LOGIN_HOST) and the batch node on which our jupyter job runs (BATCH_NODE). This is an example for Snellius, so you'll need to replace the LOGIN_HOST if you work on another system. 

PORT=`shuf -i 5000-5999 -n 1`
LOGIN_HOST=${SLURM_SUBMIT_HOST}-pub.snellius.surf.nl
BATCH_HOST=$(hostname)

echo "To connect to the notebook type the following command from your local terminal:"
echo "ssh -N -J ${USER}@${LOGIN_HOST} ${USER}@${BATCH_HOST} -L ${PORT}:localhost:${PORT}"
echo
echo "After connection is established in your local browser go to the address:"
echo "http://localhost:${PORT}"


Finally, we start the Jupyter Notebook Server:

jupyter notebook --no-browser --port $PORT

Thus, the full job script now looks like this

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -p thin
#SBATCH -o jupyter-notebook-job.out
 
# Make sure the jupyter command is available, either by loading the appropriate modules, sourcing your own virtual environment, etc.
module load 2022
module load IPython/8.5.0-GCCcore-11.3.0
module load JupyterHub/3.0.0-GCCcore-11.3.0

# Choose random port and print instructions to connect
PORT=`shuf -i 5000-5999 -n 1`
LOGIN_HOST=${SLURM_SUBMIT_HOST}-pub.snellius.surf.nl
BATCH_HOST=$(hostname)

echo "To connect to the notebook type the following command into your local terminal:"
echo "ssh -N -J ${USER}@${LOGIN_HOST} ${USER}@${BATCH_HOST} -L ${PORT}:localhost:${PORT}"
echo
echo "After connection is established in your local browser go to the address:"
echo "http://localhost:${PORT}"

jupyter notebook --no-browser --port $PORT

Step 2: Forward the port for the Jupyter Notebook Server to your local machine, via the login node

When the job has started, check the generated output file:

cat jupyter-notebook-job.out

There are instructions to setup an ssh connection to the batch node, using a proxy jump via the login node. It will forward the port on which the Jupyter Notebook Server on the batch node to your local machine. You'll need to execute this command on your local machine:

To connect to the notebook type the following command into your local terminal:<<< ssh command to copy&paste >>>

Step 3: connect to the running Jupyter Notebook Server

The 'jupyter notebook' command will print the correct web address that you'll need to go to in your browser. Typically, this looks something like http://localhost:5180/?token=2acb3e6c2a783d7387465abb57f95f603f1c18f2fd9d89cb, where the 5180 should be replaced by your specific random port number. The token is unique for every Jupyter Notebook Server session that is started. It prevents other people on the same system from being able to connect to your Jupyter Notebook Server, so treat it like you would a password: don't share it with anyone.

  • No labels