File bundling while archiving
Note: This guide uses the iCommands client. See the other guides install client and setup client.
When archiving files it is wise to think about bundling files. Bundling means that you put a large amount of files in one file. In windows this often means zipping files, in linux it often means tarring files. Technically this is not needed for archiving, but it can make it easier to store and download files in one go, especially if the bundle contains all the information from one project. You can either bundle before uploading to iRODS or afterwards. Below we will describe both methods
Bundling before iRODS
You can bundle your files before uploading to iRODS, using your own favourite tools. This can be, but is not limited to:
- winzip, 7zip (windows)
- tar,gzip (linux)
You will have to do bundling yourself and afterwards upload the bundle to iRODS. It is recommended that you make sure you know what is in the bundle. You can think about:
- Creating a list of files that are in the bundle and uploading that alongside
- Adding metadata in iRODS to describe what is in the bundle
- Adding custom files that describe what is in the bundle, like the default YODA metadata scheme, or something alongside the isa-json format
WUR bundling and archiving after uploading to iRODS
In this situation we assume that you have uploaded an unbundled collection of files to iRODS, and want to archive them to tape as a bundle. We created a rule that will do this for you, will check the status and will archive the bundle. You can call the rule like this:
This will:
- create a bundle inside iRODS
- create a file with the contents of the bundle
- archive both files to tape
- remove all original files from the collection
Be aware of the following:
- Metadata added to files in the collection will NOT be preserved
- ACL's of individual files will not be remembered
- To ensure integrity you will no longer have the option to change the collection or the files in the collection after giving the command to bundle.
if needed we can implement above in the future.
tracking progress
After executing the rule the collection will have the following metadata:
- WUR_RDM:collection:archive_tar_status = archive_tar_requested
You can track the WUR_RDM:collection:archive_tar_status metadata tag to make sure your files are archived. The end result will be that the original files are no longer there and only an archived tarfile and a file with the contents of your original collection will still be there.
During the process the regular file archiving process will also be called for the bundle that is created. Below you can find the metadata flow for the archive_tar_status on the collection and the archive_status on the generated bundle.
%%---
%%title: archive_status state diagram
%%---
stateDiagram-v2
direction LR;
state archive_tar_status {
direction LR;
archive_tar_requested --> all_tars_created
all_tars_created --> archive_requested
completed_and_hot_deleted --> all_tars_archived
all_tars_archived --> completed_tarred_archived_cleaned
}
state archive_status {
archive_requested --> archive_performed
archive_performed --> archive_completed
archive_completed --> completed_and_hot_deleted
}
When the status is completed_tarred_archived_cleaned we consider the files archived and you can safely delete your source files.
The data in your collection:
Suppose you have a directory structure like below, visualised with ils -rl:
When you exectute the rdm_tar_and_archive_collection command, nothing will change, just the metadata of the collection:
after a while the directory will look like this:
With this metadata of the collection:
The tarcontents.list.txt file will have the following data in it:
In the end the directory contents will look like this. You can see that all the original files have been removed, and that the tarfile itself is moved to tape.
Bundling in iRODS without archiving
If you want to do bundling in irods yourself you can use ibundle. See https://docs.irods.org/4.3.1/icommands/user/#ibun for more details on how this works.