Skip to content

Step 9: Archiving data in a WUR irods instance

Request archive

Inside WUR we have a tape archive that you can use to store data in a cheap manner. If you want to archive data you can do so by running an irods rule. Let's say you want to archive the /WCDSacc/courses/11032025/your_username/iris_data_copy/iris.names dataset. You can do so by executing this command. NB, this rule is only available in WUR irods instances that have tape archiving, not in yoda or irods instances from other institutions.

Request archive in WUR
irule -r irods_rule_engine_plugin-irods_rule_language-instance rdm_archive_this '*file_or_collection=/WCDSacc/courses/11032025/your_username/iris_data_copy/iris.names' ruleExecOut

You will now probably see an error that states:

Level 0: warning: file /WCDSacc/courses/11032025/your_username/iris_data_copy/iris.names does not have a checksum, file will NOT be tagged because archiving will fail

This is because this rule will check if the file has a checksum. Our archiving system will use this checksum to see if the file that you uploaded to the irods system is also the same file that ends up on the physical tape. In order to have a file with checksum we will upload the file again with the -K flag. With this flag you will tell irods to calculate the checksum of the file on your machine, calculate it on irods side, and verify if the checksum on both sides is the same. This ensures data integrity on the frist part of your upload process.

N.B. You can do the same command with smallcase -k, but in that case the checksum will ONLY be calculated on irods side, and not verified against your local file!

1
2
3
imkdir /WCDSacc/courses/11032025/your_username/iris_data_checksum
iput -K iris.data /WCDSacc/courses/11032025/your_username/iris_data_checksum/
iput -K iris.names /WCDSacc/courses/11032025/your_username/iris_data_checksum/

After uploading you can verifiy if the checksum is indeed present with a different flag on ils:

ils -L /WCDSacc/courses/11032025/your_username/iris_data_checksum

Now redo the rule execution. Note that you can also do this on a folder level(/WCDSacc/courses/11032025/your_username/iris_data_checksum). In that case every file which resides in this folder or subfolders will be archived.

irule -r irods_rule_engine_plugin-irods_rule_language-instance rdm_archive_this '*file_or_collection=/WCDSacc/courses/11032025/your_username/iris_data_checksum' ruleExecOut

First, you need to re-create the deleted collection and re-upload the files. And now, you know how to do it!

The archiving functionality is triggered by the execution of a rule. In iBridges (at this moment), it is not possible to trigger a rule, but it is possible to 'request' the file archive by adding metadata.

Note: This workaround of adding metadata only works for files and not for collections.

Go to the file 'iris.names' add the metadata: archive_status = archive_requested

File archive requested Collection archive requested

Before we archive a data object or collection, you need to make sure that a checksum has been calculated. This is because our archiving system will use this checksum to see if the file that you uploaded to the irods system is also the same file that ends up on the physical tape. In order to have a file with checksum we will reupload the file with the -K (capital K!!!) flag. With this flag you will tell irods to calculate the checksum of the file on your machine, calculate it on irods side, and verify if the checksum on both sides is the same. This ensures data integrity on the frist part of your upload process.

In your command line, run:

1
2
3
gocmd mkdir /WCDSacc/courses/11032025/your_username/iris_data_checksum
gocmd put -K iris.names /WCDSacc/courses/11032025/your_username/iris_data_checksum
gocmd put -K iris.data /WCDSacc/courses/11032025/your_username/iris_data_checksum

Verify that the checksum has been calculated:

gocmd ls -L WCDSacc/courses/11032025/your_username/iris_data_checksum/iris.names

Now, the files have been uploaded with a checksum and are ready to be archived.

The archiving functionality is triggered by the execution of a rule. In GoCMD (at this moment), it is not possible to trigger a rule, but it is possible to 'request' the file archive by adding metadata.

Note: This workaround of adding metadata only works for files and not for collections.

To the files that you want to archive, add the metadata: archive_status = archive_requested by running:

gocmd addmeta /WCDSacc/courses/11032025/your_username/iris_data_checksum/iris.names “archive_status” “archive_requested”

Before we archive a data object or collection, you need to make sure that a checksum has been calculated. This is because our archiving system will use this checksum to see if the file that you uploaded to the irods system is also the same file that ends up on the physical tape.

To begin, we will need to import some functions and define these 2 new functions:

import os
from irods import rule
import irods.keywords as kw
from connect import connect_to_irods
from upload import createcoll
from metadata import list_meta
session = connect_to_irods()


def upload_with_checksum(local_file, irods_dest):
    if not os.path.exists(local_file):
        print("ERROR: Invalid path/non-existent file.")
        return
    options = {kw.VERIFY_CHKSUM_KW: ''}
    session.data_objects.put(local_file, irods_dest, **options)
    obj = session.data_objects.get(irods_dest)
    checksum = obj.chksum()
    print(f"Upload successful. Checksum: {checksum}")
    return obj


def arch_rule(collectionname, filename=None):
    rule_body = 'rdm_archive_this'
    if filename is None:
        input_params = {'*file_or_collection': f"{collectionname}"}
    else:
        input_params = {'*file_or_collection': f"{collectionname}/{filename}"}
    rule_output = 'ruleExecOut'
    ru = rule.Rule(session, body=rule_body, instance_name="irods_rule_engine_plugin-irods_rule_language-instance", params=input_params, output=rule_output)
    out = ru.execute()
    if out and len(out.MsParam_PI):
        buf = out.MsParam_PI[0].inOutStruct.stderrBuf.buf
        buf1 = out.MsParam_PI[0].inOutStruct.stdoutBuf.buf
        if buf:
            print(buf.rstrip(b'\0').decode('utf8'))
        if buf1:
            print(buf.rstrip(b'\0').decode('utf8'))
    pass
...
Now, we create a collection and re-upload our file to that collection (with a checksum) and execute the archiving rule:
...
# Create a collection
createcoll('/WCDSacc/courses/11032025/your_username/iris_data_checksum')

# Upload with checksum
upload_with_checksum(r'C:\Users\abcde001\iris.data', '/WCDSacc/courses/11032025/your_username/iris_data_checksum/iris.data')
# Checksum will be printed

# Execute archive rule
arch_rule('/WCDSacc/courses/11032025/your_username/iris_data_checksum', 'iris.data')
...

After rule execution you will see that some new metadata has been added by the system:

WUR archive metadata
imeta ls -d /WCDSacc/courses/11032025/your_username/iris_data_checksum/iris.data
imeta ls -d /WCDSacc/courses/11032025/your_username/iris_data_checksum/iris.names

Archive process new metadata

gocmd lsmeta /WCDSacc/courses/11032025/your_username/iris_data/iris.names
1
2
3
...
# Check archive status
list_meta('/WCDSacc/courses/11032025/your_username/iris_data_checksum/iris.data')

Archive status

The tag archive_status is a protected tag used by our automation in irods. The system will update the status while doing archiving. We consider archiving done when the latest state is reached. In intermediate states we cannot be 100% sure yet that the data integrity is kept. If you intend to delete the data on your local machine, wait for the latest state in the diagram:

%%---
%%title: archive_status state diagram 
%%---
stateDiagram-v2
    direction LR;
    [*] --> archive_requested
    archive_requested --> archive_performed
    archive_performed --> archive_completed
    archive_completed --> completed_and_hot_deleted
    completed_and_hot_deleted --> [*]

Pre-archived example

A file was pre-archived: /WCDSacc/courses/11032025/this_file_is_already_archived.txt

Note: Have a look at the replicas.

WUR archive file completed, metadata

Use imeta and ils commands to find out what characteristics this.

Navigate to the file and see the metadata and the replicas information.

Use gocmd lsmeta to check the status of the archive request.

Use the list_meta() function to check the status.