This tutorial allows you to exercise the typical Research Data Management flows that iBridges supports.
If you need help configuring iBridges for your local Yoda instance, please refer to these instructions.
iBridges is a piece of software that runs as a graphical user interface for an iRODS instance (i.e.: it is an iRODS client). The typical situation that you may encounter is that an IT department will offer an iRODS or Yoda instance for your institution, which you can access through a choice of client tools, including iBridges and iCommands.
For this tutorial, we will assume that your institute is offering Yoda. Yoda is built in such a way that it governs an underlying iRODS server, offering its own web interface to steer its predefined RDM flow. With iBridges, you can work on that same iRODS server, talking in iRODS terms (i.e.: the iRODS protocol) to that server. We will use the corresponding terminology.
As a brief reminder, data objects are the combination of a file (dataset) and associated metadata. In terms of best RDM practices, that metadata belongs to the dataset; it is an intrinsic part of it. Data objects are grouped together into collections, which may be thought of as folders and can also have associated metadata. Metadata consists of information pieces that accompany the dataset or collection itself, in order not only to describe it but also to allow finding it by means of searching through those bits of information. The main idea behind this cooperating around data is that scientists can gather data and share it with others, each working at their own pace. Thus, years after the dataset has been put together, somebody else can find and reuse it in novel research projects.
All this interaction is possible with iBridges. In this tutorial, we will be showcasing and exercising these flows as though you are a scientist or a data steward.
This exercise will teach you how to search for and download datasets using the iBridges interface.
In this scenario, we are going to pretend that you are a researcher who knows that a dataset exists somewhere in the iRODS server of your institute. You want to use this dataset, but you only have limited information about it - that it involves some flights performance statistics. That is precisely what you need for your research! Let us find it now.
In iBridges, you can search for datasets that have been uploaded to your iRODS instance by you or by others within your groups. You can search by one or more of:
Please note as you follow these steps that the search fields are case sensitive. |
The result is probably going to disappoint you: you will not find anything, because the desired item has a different name. Let us try a different search method.
Let's see if the data you are after is in a collection containing the word 'flights'.
The result is probably going to disappoint you this time as well: you will not find anything. Let us try yet a different search method.
Voilà! You should now have at least one result. But do you know whether it is the right one? Let's see how we can inspect the result.
We will assume further that you have found a dataset that contains a .csv file about flights. If you are unsure, please check with the facilitators now. |
As you are looking now at the result you found, you can see that the path is contains the /.../vault-.../... word. This means that it is safely frozen for colleagues to cooperate around this dataset. Remember that, in Yoda, when data is in the Vault, you should always bring it to the Research area, before you can operate with it.
Unfortunately, iBridges does not support renaming or moving collections or objects within the same server yet. So, please, use the Yoda web portal now to bring the collection from the Vault to Research area, as you learned before. After that, come back to iBridges.
Now that you have found data in the Vault and brought it to the Research area, you can work normally with it. Following the scenario we are in, we will now assume that you want to process the data from the dataset on your laptop. This will require that you first download the data set. Let us do that with iBridges now.
You have downloaded a data set to your laptop. In order to process it you will have to run a program on it.
The data set we have prepared includes a "read-me" file. Scan quickly through it to understand what data is there. Since this dataset consists of a .csv file, you can use your favourite spreadsheet program to derive a subset and then filter and calculate something.
We propose to you a sample exercise you can carry out now: create a new file that contains the subset of the original data that contains only flights departing from the state of New York. Save the file in your laptop.
⟡ ⟡ ⟡
You have now completed this section and should understand the flow to find and reuse data. Feel free to move on to the next exercise at your own pace, but make sure you have answered the questions on the right of this page.
In this scenario we are going to pretend that you are a researcher who has been collecting data, and wants to store it in iRODS through the iBridges interface. You will be assuming the role of a seasoned data practitioner who will complete the process without intervention from a data steward.
By the end of the exercise you will know how to import data, creating and editing the metadata as required.
You can work with any dataset you may have already on your laptop. For example, you can continue working with the subset of flights you created in the previous section. Alternatively, you can download something from the Internet to your laptop. For example, you can use the data portal from the Gemeente Amsterdam to search for data that may appeal to you: https://data.amsterdam.nl/datasets/ (Please verify the dataset license before you use it!)
Since Yoda manages its metadata in a certain way (i.e.: in a .json file and in iRODS), you would NOT want to disrupt what Yoda does. For a smooth flow during the following exercises, we would like you to work with a dataset that consists only of a single file. |
The default page in iBridges is the 'Browser' tab showing the contents of your home directory, which is listed under 'iRODS path'.
We will now pretend that you are working in a project of your own. You will therefore need to create a collection for that project. Here are the steps you need to follow:
training-area
collection by double clicking on the name. If you make a mistake and enter the wrong space, use the blue return arrow next to the iRODS path to return to your prior location (or the orange home button to return to your home directory).The dataset's metadata is crucial when you are working within RDM best practices. It will ensure that your dataset is reusable in the future. So you can best start with it, even before the data exists in iRODS. Let us tackle that right now by adding metadata at the collection level.
When using Yoda, you will want to keep Yoda's functionality and flows working. Because of the way Yoda handles metadata, you should only modify metadata on folders managed by Yoda (i.e.: anything within a Research are or a Vault area), through Yoda's web portal.
Therefore, please, go to the web portal, and add metadata to the folder as you learned before. Then, come back to iBridges.
Now that you have some metadata defined for your project collection, you are going to import the dataset you just found into the project folder that you created under the collection training-area
a few steps ago. Remember? You called it Project <something>. For this exercise and for simplicity's sake, it will be enough to upload one or two files no larger than a few megabytes as though they are a full dataset; adding more would be overkill today.
training-area
collectionIn iRODS (and therefore, also, in iBridges), you can add or modify metadata at the level of the collection and for individual data objects. Each metadata tag is composed of:
(In iRODS terminology, these are referred to as AVU triplets. This stands for Attribute-Value-Unit.) |
You have now imported one or more files that you can use for your research into your project folder. Think about what sort of metadata should be stored at the level of the individual data files.
Since Yoda manages its permissions in a certain way (i.e.: in terms of groups, projects, etc.), to facilitate its flows, you would NOT want to disrupt what Yoda does. For a smooth flow for all colleagues, do not change any permission for the research or vault areas. You can change permissions in folders or files that you create. |
In iRODS, data access permissions are managed through groups. Are you curious about which groups you belong to? At the top of the iBridges window, switch to the 'Info' tab to view your username, groups and server information for the iRODS instance you are connected to. |
Now your project folder is complete with a dataset and high quality metadata. Let us now ensure that other intended users will have the right level of access to these files.
training-area
collection, click to select your project folder (i.e.: something like "Project Z", or "Project Peter" or "Project Flamingos", as you created it before)⟡ ⟡ ⟡
Well done! You have now completed an RDM workflow, taking good care over the findability and accessibility of your dataset.
Well done for having completed this walk-through!
If you need help configuring iBridges for your local Yoda instance, please refer to these instructions.
For next steps with regards to RDM, iRODS, or other research services, you can:
Thank you for your attention, and we hope to have been of help for you today.