In the Blind SANE environment you cannot see the sensitive data you work with. You will have to prepare a script or container that will 'blindly' be executed in this environment. The script or container you will be executing will not have access to the internet. The data provider may choose to provide you with a sample and/or synthetic data set to prepare your analysis script. The data provider checks and releases the outputs of the script. In this document you will find the steps in order to work with Blind SANE.
1. Prepare your analysis
Your analysis can run via a Python script or via a Docker container.
A Python script
- Write and test a Python script that performs the analysis. Use synthetic data or a subset of the sensitive dataset (if your data provider allows) to functionally test your script.
- Upload your script to a public git repository and name it
script.py
If the script is prepared by the data provider you will find the script located at /data/sane-data/scripts/
The repository should also contain a requirements.txt
containing a list of all the pip packages needed for the analysis. For more information, see: https://pip.pypa.io/en/stable/reference/requirements-file-format/#
The Python script will be called as such:
python3 script.py -i <input-dir> -o <output-dir> -t <temp-dir> 1> stdout.txt 2> stderr.txt
All output that would be printed to stdout and stderr will be flushed to stdout.txt and stderr.txt respectively and will be made available in the output directory
A Docker container
- Create a Dockerfile that performs the analysis. You can run and test the resulting Docker image on your local computer.
- Upload your Dockerfile to a public git repository or upload the image to Dockerhub
When writing your container keep in mind that the sensitive dataset will be made available to you at the directory /input
within the container and that your results should be written to /output
within the container.
2. Log in to the Research Cloud portal
Go to SURF Research Cloud portal and log in with the identity that the data provider invited you to the Collaborative Organisation (CO).
3. Create a Blind workspace
- Click on "Create new workspace"
- Select the CO to which you were invited
- Select the catalog item "Blind SANE"
- Select a flavour for your virtual environment that is suitable for your application
- In the 'Options' step select the private network that is made available to you
- In the final step of the wizard specify the public git repository URL to your Python script or Dockerfile (or provide the name of your Docker image) (see step 1)
- Finalise the wizard. Set the expiration date to a date after which you are sure the analysis will be completed.
4. Perform analysis
Your analysis will run in the background and you will receive an e-mail when the analysis has been completed. Upon completion it is your responsibility to delete the workspace.
5. Prepare output results
The results are written to the directory /results.
Ask the data provider outside of SANE (e.g. via e-mail) to review these results. The data provider will make the output results available outside of SANE. The data will not be deleted when you delete your workspace (as it is a shared filesystem).