View Source

In the Blind SANE environment, you cannot see the sensitive data you work with. You will have to prepare a script or container that will 'blindly' be executed in this environment. The script or container you will be executing will not have access to the internet. The data provider may choose to provide you with a sample and/or synthetic data set to prepare your analysis script. The data provider checks and releases the outputs of the script. In this document, you will find the steps to work with Blind SANE.

1. Prepare your analysis

Your analysis can run via a Python script or a Docker container. Currently, the default method is a Python script. If you wish to use a Docker container, please contact the SURF servicedesk to enable this for your project.

A Python script

Write and test a Python script that performs the analysis. Use synthetic data or a subset of the sensitive dataset (if your data provider allows) to functionally test your script.
Upload your script to a public git repository and name it script.py

If the script is prepared by the data provider you will find the script located at /data/sane-data/scripts/

The repository should also contain a requirements.txt containing a list of all the pip packages needed for the analysis. For more information, see: https://pip.pypa.io/en/stable/reference/requirements-file-format/#

The Python script will be called as such:

python3 script.py -i <input-dir> -o <output-dir> -t <temp-dir> 1> stdout.txt 2> stderr.txt

All output that would be printed to stdout and stderr will be flushed to stdout.txt and stderr.txt respectively and will be made available in the output directory

A Docker container

Create a Dockerfile that performs the analysis. You can run and test the resulting Docker image on your local computer.
Upload your Dockerfile to a public git repository or upload the image to Dockerhub

When writing your container keep in mind that the sensitive dataset will be made available to you at the directory /input within the container and that your results should be written to /output within the container.

2. Log in to the Research Cloud portal

Go to SURF Research Cloud portal and log in with the identity that the data provider invited you to the Collaborative Organisation (CO).

3. Create a Blind workspace

Click on "Create new workspace"
Select the CO to which you were invited
Select the catalog item "Blind SANE"
Select a flavour for your virtual environment that is suitable for your application
In the 'Options' step select the private network that is made available to you
In the final step of the wizard specify the following, depending on your method of choice:
1. Python script: In the field 'python_script_source' fill in the public git repository containing your script.py (ending with .git) OR the name of the folder within the /data/sane-data/scripts/ folder in which the data provider placed the script.py file
2. Docker: In the field 'docker_repo_url' fill in the public git repository URL to your Dockerfile OR in the field 'docker_image_name' provide the name of your Docker image
Finalise the wizard. Set the expiration date to a date after which you are sure the analysis will be completed.

4. Perform analysis

Your analysis will run in the background and you will receive an e-mail when the analysis has been completed. Upon completion, it is your responsibility to delete the workspace. Not deleting the workspace will result in credits being unnecessarily consumed.

5. Prepare output results

The results are written to the directory /results. Ask the data provider outside of SANE (e.g. via e-mail) to review these results. The data provider will make the output results available outside of SANE. The data will not be deleted when you delete your workspace (as it is a shared filesystem).