General information
One of the most powerful features of iRODS that is embedded in iCE is the metadata management. With very simple steps and iCommands you can add as many metadata tags as you need to your data objects (files) and/or collections (directories). Then, with custom queries you would be able to locate and list your files based on one or more metadata tags and values, despite the fact that your files or folders may be in different locations or resources.
How to add metadata
To learn how to add metadata to your data objects we will go through an easy use case in which we have three different data objects (files) located in different collections (directories). In this use case we want to add a metadata tag to these files that will designate their type and a timestamp.
So, first let's check the existing metadata tags on these three files that are in different directories by executing the following:
$ icd /snow/home/icepocsurf/ $ imeta ls -d extract-dir/testFile1.txt AVUs defined for dataObj /snow/home/icepocsurf/extract-dir/testFile1.txt: attribute: irods::access_time value: 1681302055 units: irods::storage_tiering::migration_scheduled $ imeta ls -d extract-dir/testFile2.txt AVUs defined for dataObj /snow/home/icepocsurf/extract-dir/testFile2.txt: attribute: irods::access_time value: 1681301112 units: irods::storage_tiering::migration_scheduled $ imeta ls -d transform-dir/testFile3.txt AVUs defined for dataObj /snow/home/icepocsurf/transform-dir/testFile3.txt: attribute: irods::access_time value: 1681302284 units: irods::storage_tiering::migration_scheduled
Here, the object type descriptor is -d to specify that we're working with files. Other options are -C for collections, -R for resources and -u for users.
As you can see, all of the files already contain system metadata added automatically by iCE.
Each metadata tag consists of three parts. The attribute, the value, and the units.
To add new metadata tags to the three files we will execute the following iCommand:
$ imeta add -d extract-dir/testFile1.txt raw-data 01-05-2023 time $ imeta add -d extract-dir/testFile2.txt raw-data 01-05-2023 time $ imeta add -d transform-dir/testFile3.txt raw-data 01-05-2023 time
In these commands we follow the same syntax. After the "imeta add -d" we define the data object (file or folder) and then the attribute name (raw-data), the value (01-05-2023), and the units (time).
For more detailed information regarding iRODS imeta command please check the official iRODS documentation.
Now that we have added our metadata tags we can verify our updates by executing the following:
$ imeta ls -d extract-dir/testFile1.txt AVUs defined for dataObj /snow/home/icepocsurf/extract-dir/testFile1.txt: attribute: irods::access_time value: 1681302055 units: irods::storage_tiering::migration_scheduled ---- attribute: raw-data value: 01-05-2023 units: time
How to query metadata
Data objects can be queried based on their metadata using the iCommand iquest. The following simple example requests the collection name and data object name of all data objects with a 'raw-data' metadata tag:
$ iquest "SELECT COLL_NAME, DATA_NAME WHERE META_DATA_ATTR_NAME = 'raw-data' " COLL_NAME = /snow/home/icepocsurf/extract-dir DATA_NAME = testFile1.txt ------------------------------------------------------------ COLL_NAME = /snow/home/icepocsurf/extract-dir DATA_NAME = testFile2.txt ------------------------------------------------------------ COLL_NAME = /snow/home/icepocsurf/extract-dir DATA_NAME = testFile4.txt ------------------------------------------------------------ COLL_NAME = /snow/home/icepocsurf/transform-dir DATA_NAME = testFile3.txt ------------------------------------------------------------
To further refine the query we can specify the value of the 'raw-data' metadata tag:
$ iquest "SELECT COLL_NAME, DATA_NAME WHERE META_DATA_ATTR_NAME = 'raw-data' AND META_DATA_ATTR_VALUE = '02-05-2023'" COLL_NAME = /snow/home/icepocsurf/extract-dir DATA_NAME = testFile4.txt ------------------------------------------------------------