This tutorial allows you to exercise the typical Research Data Management flows that iBridges supports.
iBridges is a piece of software that runs on top of iRODS as a graphical user interface. The typical situation that you may encounter is that an IT department will offer an iRODS or Yoda instance for your institution, which you can access through a choice of interfaces, including iBridges and iCommands.
For this tutorial, we will assume that your institute is using iRODS and we will use the corresponding terminology. As a brief reminder, data objects are the combination of a file (dataset) and associated metadata. In terms of best RDM practices, that metadata belongs to the dataset; it is an intrinsic part of it. Data objects are grouped together into collections, which may be thought of as folders and can also have associated metadata. Metadata consists of information pieces that accompany the dataset or collection itself, in order not only to describe it but also to allow finding it by means of searching through those bits of information. The main idea behind this cooperating around data is that scientists can gather data and share it with others, each working at their own pace. Thus, years after the dataset has been put together, somebody else can find and reuse it in novel research projects.
All this interaction is possible with iBridges. In this tutorial, we will be showcasing and exercising these flows as though you are a scientist or a data steward.
Table of Contents
1. Finding and reusing existing data
This exercise will teach you how to search for and download datasets using the iBridges interface.
In this scenario, we are going to pretend that you are a researcher who knows that a dataset exists somewhere in the iRODS instance of your institute. You want to use this dataset, but you only have limited information about it - that it involves a picture taken over Disko Bay in Greenland. That is precisely what you need for your research! Let us find it now.
1.1 Searching for a dataset
In iBridges, you can search for datasets that have been uploaded to your iRODS instance by you or by others within your groups. You can search by one or more of:
- object name
- collection name
- metadata (key-value pairs)
- checksum
Please note as you follow these steps that the search fields are case sensitive.
a) Search by object name
- From the program bar at the top of your screen, select 'Options' and 'Search'
- Enter the full name of the item you are searching for under 'Object name', or use the wildcard '%' to complete the start, middle or end of the string (or both) as required, e.g.:
- DiskoBay.png
- %Bay.png
- %Disko%
- Hit "Enter" to run the search
The result is probably going to disappoint you: you will not find anything, because the desired item has a different name. Let us try a different search method.
b) Search by collection
Let's see if the data you are after is in a collection containing the word 'Disko'.
- In the Search box, enter '%Disko%' in the 'Collection name' field.
- Hit "Enter" to run the search
The result is probably going to disappoint you this time as well: you will not find anything. Let us try yet a different search method.
c) Search by metadata
- In the Search box, enter 'KEYWORD' in the first 'Key' field. (All keys are saved in uppercase.)
- In the corresponding 'Value' field, enter a word that you think is reasonable for the limited information you have about the dataset, such as: "ocean" or "Disko" or "Greenland"
- Hit "Enter" to run the search
Voilà! You should now have at least one result. But do you know whether it is the right one? Let's see how we can inspect the file.
1.2 Viewing and downloading your search results
- From the list of results of your search, click on the one that you want to work with
- At the bottom of the search box, choose 'Select and Close'. (Other options are available to download the item(s) or to quit without action)
- Now you are viewing the search result from the iBridges browser tab. Select the item and choose the 'Metadata' tab to view the associated metadata.
- Looking through the information shown about this file in the browser, can you answer now some of the questions to the right of this hand-out? For example: can you now explain why you were not able to find the data set when searching by name or by folder, but you were when you searched by metadata?
- Use the 'File Download' button from the browser view of your search results to download the image of Disko Bay. Now you can do your research!
⟡ ⟡ ⟡
You have now completed this section and should understand the flow to find and reuse data. Feel free to move on to the next exercise at your own pace, but make sure you have answered the questions on the right to verify that you have found the intended dataset.
Flow
Questions to answer throughout this section:
- What is the file name of the picture?
- What is the folder name of the picture?
- When was the picture taken?
- Who took the picture? What is their affiliation?
- Which three location tags have been given to the picture?
- What does the picture show (i.e., can you describe what the photograph has captured)?
Food for thought:
- What is the name of the root folder of the dataset?
2. Importing and managing data
In this scenario we are going to pretend that you are a researcher who has been collecting data, and wants to store it in iRODS through the iBridges interface. You will be assuming the role of a seasoned data practitioner who will complete the process without intervention from a data steward.
By the end of the exercise you will know how to import data, creating and editing the metadata as required.
You can work with any dataset you may have already on your laptop, or you can download something from the Internet. For example, you can use the data portal from the Gemeente Amsterdam to search for data that may appeal to you: https://data.amsterdam.nl/datasets/zoek/ (Please verify the dataset license before you use it!)
2.1 Preparing a working place in your project area
The default page in iBridges is the 'Browser' tab showing the contents of your home directory, which is listed under 'iRODS path'.
We will now pretend that you are working in a project of your own. You will therefore need to create a collection for that project. Here are the steps you need to follow:
- From your home directory, navigate to the
training-area
collection by double clicking on the name. If you make a mistake and enter the wrong space, use the blue return arrow next to the iRODS path to return to your prior location (or the orange home button to return to your home directory). - Click on the Create Collection button.
- Give this new collection a name with the format "Project X", where X should be something that you will be happy to work with, such as: "Project Peter" or "Project Flamingos". Please, remember what you choose, because the other course attendants will be creating their own collections here too.
2.2 Adding metadata to your new collection
In iBridges, you can add or modify metadata at the level of the collection and for individual data objects.
Each metadata tag is composed of:
- key (mandatory)
- value (mandatory)
- unit (optional)
(In iRODS terminology, these are referred to as AVU triplets. This stands for Attribute-Value-Unit.)
The dataset's metadata is crucial when you are working within RDM best practices. It will ensure that your dataset is reusable in the future. So you can best start with it, even before the data exists in iRODS. Let us tackle that right now by adding metadata at the collection level.
- Remaining in the
training-area
collection view, click on the collection you just created. - Select the 'Metadata' tab
- Now take all the time you need to think about what is reasonable metadata, and make sure you write plenty of it. Recall the feeling when you were searching for data in the previous exercise.
- For inspiration, what would have helped you to be more effective in finding the dataset? Apply that now to facilitate that others will find your dataset both when they know it is there, and when they do not know it is there. This last case describes a data discovery scenario.
- If you are working with a dataset which is published somewhere else (e.g.: like the Gemeente Amsterdam), you can draw ideas from the metadata that you already actually see in that portal.
- For datasets that involve spatial or temporal information, make sure you fill in appropriate intervals and location descriptors. You may look at the previous exercise's dataset to see how you can include multiple location descriptors.
- Think of the data policies from your research field or your institution. How could you use metadata to meet the requirements of those policies?
- Add the metadata you come up with by entering the 'Key', 'Value' and (optionally) the 'Unit' in the respective fields and using the 'Add' button
2.3 Uploading data
Now that you have some metadata defined for your project collection, you are going to import the dataset you just found into the project folder that you created under the collection training-area
a few steps ago. Remember? You called it Project <something>. For this exercise and for simplicity's sake, it will be enough to upload one or two files no larger than a few megabytes as though they are a full dataset; adding more would be overkill today.
- Return to the view of your project folder in the
training-area
collection - Click on 'File Upload'
- Locate your file, click 'Open' and confirm that you would like to upload the item to your collection
2.4 Adding metadata to data objects
You have now imported one or more files that you can use for your research into your project folder. Considering the list in section 2.2, think about what sort of metadata should be stored at the level of the individual data files.
- Select a file that you have uploaded and navigate to the 'Metadata' tab
- Add metadata using the flow you are familiar with from section 2.2
- If you are working through these exercises with colleagues, now would be a good moment to ask them to verify your metadata at both the collection and the file levels and engage in a little discussion to see if you agree on what you have written
2.5 Changing permissions
In iRODS, data access permissions are managed through groups. Are you curious about which groups you belong to? At the top of the iBridges window, switch to the 'Info' tab to view your username, groups and server information for the iRODS instance you are connected to.
Now your project folder is complete with a dataset and high quality metadata. Let us now ensure that other intended users will have the right level of access to these files.
- In the
training-area
collection, click to select your project folder - Navigate to the 'Permissions' tab
- In the table you can see which users and groups have access rights to the data objects in that collection
- Select the 'oceanviewers' group
- These users should be able to view, but not change data in this collection. Change their 'Access' level to 'read' and click 'Add/Update'.
⟡ ⟡ ⟡
Well done! You have now completed an RDM workflow, taking good care over the findability of your dataset.
Flow
Food for thought:
- For the iCommands users, consider how these workflows would be tackled in the command line. E.g., uploading and downloading files can be done with the 'iput' and 'iget' commands.
❦ Epilogue
Well done for having completed this walk-through!
If you need help configuring iBridges for your local Yoda instance, please refer to these instructions.
For next steps with regards to RDM, iRODS, or other research services, you can:
- visit our documentation pages:
- Yoda: Yoda Hosting
- iRODS: iRODS
- visit our research services web page: https://www.surf.nl/en/research-it
- or contact our service desk through the Service Desk Portal
Thank you for your attention, and we hope to have been of help for you today.
Complementary information:
- iBridges is open source, and you can view the code and advanced documentation in the GitHub page: https://github.com/chStaiger/iBridges-Gui