In order to conduct certain research it might be necessary to work with data that is sensitive in nature. For example: data containing personal information, health data, copyrighted data, etc. These types of data are often not accessible to the public and can only be accessed through a formal procedure. The party that wishes to provide this data (i.e. the data provider) for research could impose a requirement that the data cannot be simply downloaded, but must rather be worked with within a controlled secure environment. Following the Five Safes principles, the data provider remains in full control of the data, as stipulated by the General Data Protection Regulation (GDPR). 

In SURF Research Cloud researchers can work with sensitive data in a Secure ANalysis Environment (SANE). SANE falls under the ISO270001 certification of Research Cloud and successfully passed an independent rigorous penetration test. SANE is a special construction within Research Cloud where a data provider and one or more researchers form a Collaboration (CO) through SRAM specifically for the purpose of conducting research on sensitive data. The data provider uploads the data to a secure analysis environment, where the only the researchers within that Collaboration can perform their analysis on that data. Data that are placed in SANE cannot be extracted to the outside world, with the exception of aggregated output results that were approved by the data provider. There are two types of SANE: 1) Tinker, 2) Blind.

Tinker SANE

In a Tinker SANE environment a researcher can see the data and tinker with it, but not extract it. This type of SANE is especially useful for research projects where the researcher combines several different data sources and where specific characteristics of the combined data determine consequent analytical steps. Through the use of a restricted Remote Desktop workspace the researcher can log into the SANE and work with the data as they would on their own machine. The main difference between SANE and a regular Research Cloud workspace is that there is no internet connection and no clipboard

Currently, Tinker SANE is available as a virtual Windows (2019 server) desktop containing the analysis software RStudio (version 2023.09.1) and JupyterLab Desktop (latest). 

Blind SANE

In a Blind SANE environment a researcher cannot see nor extract the data, but can submit scripts that analyse the data. Researchers are expected by the data provider to work 'blindly' with the data. 

This type of SANE can be used when the data is structured and enough metadata is available to prepare the tools and analysis scripts in advance. Another possibility is that a sample and/or synthetic dataset is provided. Like Tinker SANE there is no internet connection within a Blind SANE workspace. Unlike Tinker SANE, there is no way to log into a Blind SANE workspace.

Currently, Tinker Blind is available as a Linux Ubuntu 20.04 headless virtual machine.

Prerequisites

To use SANE, there are a few prerequisites. 

  • There should be a SURF Research Cloud contract in place between SURF and the Data provider, which is free of charge. It serves as a Data Processing Agreement and is required as per the GDPR. This contract is already in place for most Dutch research and educational institutes (see: https://www.surf.nl/en/about-surf/surf-members). Data providers that fall outside this scope can contact the SURF servicedesk (select Research Cloud as the corresponding service in your ticket). 
  • The researcher is required to have sufficient funds to use the SANE analysis environment, i.e. have sufficient funds to consume resources on Research Cloud (see: https://www.surf.nl/en/surf-research-cloud-collaboration-portal-for-research). You can apply for funding in two ways:. 
    • A national grant: Small Applications
    • The institute of the researcher has an RCCS contract with SURF. With this contract the institute buys credits that can be distributed to the researcher. Contact your institute’s central IT department for credits.

Researchers that fall outside this scope can contact the SURF servicedesk

  • Should the data provider not want to be dependent on SURF for creating the CO, or want to make use of CO's that need to be connected to other services that are not provided by SURF, the data provider should have their own SRAM contract with SURF. SRAM is in the core services list for most SURF members already. Find an overview of institutes with an SRAM contract here: https://servicedesk.surf.nl/wiki/display/WIKI/Institutions+using+SRAM
  • We advise that there is a data sharing agreement between the data provider and the (institute of the) researcher. This agreement is to be arranged bilaterally without any SURF involvement. 

Logging into the SANE workspaces might require the use of a time-based one-time password (TOTP). For instructions on how to set this up we refer you to our documentation: Log in to your workspace#WorkspaceAccesswithTOTP

Five safes

SANE is built around the Five Safes principles, which is a set of best practices in data protection developed by UKDA The principles cover five aspects (Safe People, Safe Output, Safe Settings, Safe Projects, Safe Data) which make the use of sensitive data effective and confidential.  

  1. Safe People: To access sensitive data, researchers in SANE are required to complete the application process and be authorised. Authentication and Authorization Infrastructure (AAI) for SANE is built in via SURF Research Access Management (SRAM). SRAM provides data providers confidence that the researcher has been authorised by, for example, being affiliated with the Dutch university or research facility. In case, the researcher does not have access to SRAM, one can be obtained via EduID. In this case, however, identity cannot be automatically verified.    
  2. Safe Output: All research outputs should be verified by the data provider prior to allowing researchers to further use it in their projects. This step guarantees the safety of sensitive data, making sure that no sensitive information is available in the output.
  3. Safe Settings: SANE is locked off from the internet, leaving the only connection to the safe environment available for the researcher while working with sensitive data. At the same time, the data provider can even prevent the researcher from seeing the data in case of usage of Blind SANE. These settings ensure not only monitoring the researcher's actions but also create safe access to data by enforcing the researcher to use two-factor authentication (2FA)
  4. Safe Projects: As a part of the application process to use SANE, the data provider typically vets the research purpose and the purpose of using sensitive data. This allows data providers to decide whether the sensitive data can be used, preventing the unethical usage of sensitive data. Moreover, each research project is performed within its own isolated SANE environment. 
  5. Safe Data: The data provider is advised to give access to the minimal amount of data needed to answer the research question, while also pseudonymising the data set before granting access to the researchers. This prevents researchers from having access to the sensitive information while working in secure environment. Uploading data can also be prevented, as combining more data sets may result in de-anonymisation.  


  • No labels