Archiving data on tape in a WUR iRODS instance using icommands:
Note: This guide uses the iCommands client. See the other guides install client and setup client.
Inside WUR we have a tape archive that you can use to store data in a cheap manner. If you want to archive data you can do so by running an iRODS rule.
Let's say you want to archive the /WCDSacc/somecollection/iris_data_copy/iris.names dataset. You can do so by executing this command. NB, this rule is only available in WUR iRODS instances that have tape archiving, not in yoda or iRODS instances from other institutions.
You might now see an error that states:
This is because this rule will check if the file has a checksum. Our archiving system will use this checksum to see if the file that you uploaded to the iRODS system is also the same file that ends up on the fysical tape. In order to have a file with checksum we will upload the file again with the -K flag. With this flag you will tell iRODS to calculate the checksum of the file on your machine, calculate it on iRODS side, and verify if the checksum on both sides is the same. This ensures data integrity on the frist part of your upload process.
N.B. You can do the same command with smallcase -k, but in that case the checksum will ONLY be calculated on iRODS side, and not verified against your local file!
After uploading you can verifiy if the checksum is indeed present with a different flag on ils:
Now redo the rule execution. Note that you can also do this on a folder level(/WCDSacc/somecollection/iris_data_checksum). In that case every file which resides in this folder or subfolders will be archived.
After rule execution you will see that some new metadata has been added by the system:
The tag archive_status is a protected tag used by our automation in iRODS. The system will update the status while doing archiving. We consider archiving done when the latest state is reached. In intermediate states we cannot be 100% sure yet that the data integrity is kept. If you intend to delete the data on your local machine, wait for the latest state in the diagram:
%%---
%%title: archive_status state diagram
%%---
stateDiagram-v2
direction LR;
[*] --> archive_requested
archive_requested --> archive_performed
archive_performed --> archive_completed
archive_completed --> completed_and_hot_deleted
completed_and_hot_deleted --> [*]
After the file is done archiving we also protect it by restricting the abilities to remove and change this file. See wur_features
Retrieving archived data.
Retrieving archived data is simple, but you have to be aware of the fact that your data is not stored in a location where you can immediately access it. If you have a dataset that is archived, you can retrieve itsimply by doing
However, because the data is on tape, you will not be able to read it straight away. You will get an error which will look more or less like this:
This means that the tape library has the file available, but only on the physical tape itself. It will now start copying the file to a cache layer. Once it is there iRODS can download the file, and so will you. The file will stay at the cache layer for 7 days before it is removed from there. The file will not be removed from the tape itself, so after 7 days you can repeat above steps if needed.
Checking tape cache status
Within WUR we implemented a way to check if the file is downloadable from tape in the metadata. Once you try to download a file from tape we will keep a metadtaa status available that will signify if it is ready and which of the two tapes has the file in cache:
NOTE
For more advanced usage and customization, refer to the iRODS 4.3.2 documentation and the iRODS community resources.