Meant for:

  • iRODS admin

Requirements:


Packaging data can be useful when a dataset or folder/collection exists of many (small or big) files and needs be archived (either for publishing or cost reduction purposes). However, packaging data before upload can be a tedious operation for most users. Here we show how to enable the packaging workflow using the SURF BagIt iRODS ruleset. After this, iRODS users will be able to package datasets according to How to package and archive datasets using BagIt workflows in iRODS.

Enabling the packaging workflow

iRODS users will be able to mark collections to be packaged. The SURFbagitBatch iRODS rule (installed by SURF iRODS admins) will search for collections that are marked for packaging, and perform the packaging workflow asynchronously in the background.

There are two ways to enable this workflow. One is to manually run the SURFbagitBatch rule, which will find all collection candidates once. An example of how to run such a rule:

user@login:~$ cat bagit.r
bagitRule {

  *CMD = 'bagit'

  *response = SURFbagitBatch(*DEFAULT_RESC, *ADMIN_USER, *CMD);
  writeLine("stdout", "*response");

}
INPUT *DEFAULT_RESC="surfResc", *ADMIN_USER="rods#surfZone"
OUTPUT ruleExecOut

Note that this rule needs to be run as an iRODS admin:

user@login:~$ irule -F bagit.r

However, typically you want the SURFbagitBatch rule to be run regularly and without invoking manually each time. To do this, you can transform the above rule into a delayed rule to be executed with a certain frequency:

user@login:~$ cat bagitRun.r
bagitRule {
  *CMD = 'bagit'
  delay("<EF>2h</EF>") {
    *response = SURFbagitBatch(*DEFAULT_RESC, *ADMIN_USER, *CMD);
    writeLine("stdout", "*response");
  }
}
INPUT *DEFAULT_RESC="surfResc", *ADMIN_USER="rods#surfZone"
OUTPUT ruleExecOut

user@login:~$ irule -F bagitRun.r

This will ensure that the SURFbagitBatch rule is executed by iRODS every 2 hours.

Enable the unpackaging workflow

The unpackaging workflow is similar to the packaging workflow:

user@login:~$ cat unbagitRun.r
unbagitRule {

  *CMD = 'unbagit'
  delay("<EF>2h</EF>") {
    *response = SURFunbagitBatch(*DEFAULT_RESC, *ADMIN_USER, *CMD);
    writeLine("stdout", "*response");
  }
}
INPUT *DEFAULT_RESC="surfResc", *ADMIN_USER="rods#surfZone"
OUTPUT ruleExecOut

user@login:~$ irule -F unbagitRun.r
  • No labels