Skip to content

File bundling while archiving

Note: This guide uses the iCommands client. See the other guides install client and setup client.

When archiving files it is wise to think about bundling files. Bundling means that you put a large amount of files in one file. In windows this often means zipping files, in linux it often means tarring files. Technically this is not needed for archiving, but it can make it easier to store and download files in one go, especially if the bundle contains all the information from one project. You can either bundle before uploading to iRODS or afterwards. Below we will describe both methods

Bundling before iRODS

You can bundle your files before uploading to iRODS, using your own favourite tools. This can be, but is not limited to:

  • winzip, 7zip (windows)
  • tar,gzip (linux)

You will have to do bundling yourself and afterwards upload the bundle to iRODS. It is recommended that you make sure you know what is in the bundle. You can think about:

  • Creating a list of files that are in the bundle and uploading that alongside
  • Adding metadata in iRODS to describe what is in the bundle
  • Adding custom files that describe what is in the bundle, like the default YODA metadata scheme, or something alongside the isa-json format

WUR bundling and archiving after uploading to iRODS

In this situation we assume that you have uploaded an unbundled collection of files to iRODS, and want to archive them to tape as a bundle. We created a rule that will do this for you, will check the status and will archive the bundle. You can call the rule like this:

irule -r irods_rule_engine_plugin-irods_rule_language-instance rdm_tar_and_archive_collection '*collection=/WCDSacc/courses/somedir/subdir' ruleExecOut

This will:

  • create a bundle inside iRODS
  • create a file with the contents of the bundle
  • archive both files to tape
  • remove all original files from the collection

Be aware of the following:

  • Metadata added to files in the collection will NOT be preserved
  • ACL's of individual files will not be remembered
  • To ensure integrity you will no longer have the option to change the collection or the files in the collection after giving the command to bundle.

if needed we can implement above in the future.

tracking progress

After executing the rule the collection will have the following metadata:

  • WUR_RDM:collection:archive_tar_status = archive_tar_requested

You can track the WUR_RDM:collection:archive_tar_status metadata tag to make sure your files are archived. The end result will be that the original files are no longer there and only an archived tarfile and a file with the contents of your original collection will still be there.

During the process the regular file archiving process will also be called for the bundle that is created. Below you can find the metadata flow for the archive_tar_status on the collection and the archive_status on the generated bundle.

%%---
%%title: archive_status state diagram 
%%---
stateDiagram-v2
    direction LR;

    state archive_tar_status {
        direction LR;
        archive_tar_requested --> all_tars_created
        all_tars_created --> archive_requested
        completed_and_hot_deleted --> all_tars_archived
        all_tars_archived --> completed_tarred_archived_cleaned
    }

    state archive_status {
        archive_requested --> archive_performed
        archive_performed --> archive_completed
        archive_completed --> completed_and_hot_deleted

    }

When the status is completed_tarred_archived_cleaned we consider the files archived and you can safely delete your source files.

The data in your collection:

Suppose you have a directory structure like below, visualised with ils -rl:

/WCDSacc/courses/15012025/luijs002/showtarring:
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file1.txt
  C- /WCDSacc/courses/15012025/luijs002/showtarring/dir1
/WCDSacc/courses/15012025/luijs002/showtarring/dir1:
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file2.txt
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file3.txt
  C- /WCDSacc/courses/15012025/luijs002/showtarring/dir2
/WCDSacc/courses/15012025/luijs002/showtarring/dir2:
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file4.txt
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file5.txt

When you exectute the rdm_tar_and_archive_collection command, nothing will change, just the metadata of the collection:

1
2
3
4
5
6
7
8
AVUs defined for collection /WCDSacc/courses/15012025/luijs002/showtarring:
attribute: WUR_RDM:collection:archive_tar_status
value: archive_tar_requested
units:
----
attribute: WUR_RDM:collection:tarring_strategy
value: simple
units:

after a while the directory will look like this:

/WCDSacc/courses/15012025/luijs002/showtarring:
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file1.txt
  irods             0 hot_1;local_filestore_1745        10240 2025-04-14.13:58 & generatedtar.tar
  irods             0 hot_1;local_filestore_1745          610 2025-04-14.13:58 & tarcontents.list.txt
  C- /WCDSacc/courses/15012025/luijs002/showtarring/dir1
/WCDSacc/courses/15012025/luijs002/showtarring/dir1:
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file2.txt
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file3.txt
  C- /WCDSacc/courses/15012025/luijs002/showtarring/dir2
/WCDSacc/courses/15012025/luijs002/showtarring/dir2:
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file4.txt
  luijs002          0 hot_1;local_filestore_1745            0 2025-04-14.13:54 & file5.txt

With this metadata of the collection:

1
2
3
4
5
6
7
8
AVUs defined for collection /WCDSacc/courses/15012025/luijs002/tarring:
attribute: WUR_RDM:collection:archive_tar_status
value: all_tars_created
units:
----
attribute: WUR_RDM:collection:tarring_strategy
value: simple
units:

The tarcontents.list.txt file will have the following data in it:

1
2
3
4
5
[   ['/WCDSacc/courses/15012025/luijs002/showtarring/dir1', 'file2.txt', 'sha2:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU='], 
    ['/WCDSacc/courses/15012025/luijs002/showtarring/dir1', 'file3.txt', 'sha2:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU='], 
    ['/WCDSacc/courses/15012025/luijs002/showtarring/dir2', 'file4.txt', 'sha2:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU='], 
    ['/WCDSacc/courses/15012025/luijs002/showtarring/dir2', 'file5.txt', 'sha2:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU='], 
    ['/WCDSacc/courses/15012025/luijs002/showtarring', 'file1.txt', 'sha2:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=']]
If you calculated the checksum of the files before tarring them, you can use this to check the checksum again after re-downloading the files. There is no automation builtin right now that performs this check for you.

In the end the directory contents will look like this. You can see that all the original files have been removed, and that the tarfile itself is moved to tape.

/WCDSacc/courses/15012025/luijs002/showtarring:

  irods             1 tape_1;s3nlwcdsacc01        10240 2025-04-14.14:03 & generatedtar.tar
  irods             2 tape_2;s3nlwcdsacc02        10240 2025-04-14.14:03 & generatedtar.tar
  irods             0 hot_1;local_filestore_1745          610 2025-04-14.13:58 & tarcontents.list.txt

  C- /WCDSacc/courses/15012025/luijs002/showtarring/dir1
/WCDSacc/courses/15012025/luijs002/showtarring/dir1:
  C- /WCDSacc/courses/15012025/luijs002/showtarring/dir2
/WCDSacc/courses/15012025/luijs002/showtarring/dir2:

Bundling in iRODS without archiving

If you want to do bundling in irods yourself you can use ibundle. See https://docs.irods.org/4.3.1/icommands/user/#ibun for more details on how this works.

ibun /WCDSacc/courses/somedir/subdir