Wur features
iRODS provides a means to automate and connect things, you could consider it a swiss army knife. This enables us to make some specific choices to cater an implementation to the WUR ecosystem. In this document we will explain what we have inside WUR.
Different types of iRODS usage
We have a generic iRODS server for generic usage by everyone inside WUR. In order to use this you can either request a filetransfer from the W-drive via servicenow to tape, or you can transfer files towards iRODS yourself. The latter option will have more freedom and the added benefit of using the other iRODS functionality like adding metadata.
If you need a server with more advanced requirements that are not available on the generic server, or want to experiment with more advanced options, a private iRODS server might be what you need. Everything that is described below is available on both the generic and a private server, but in the private server we can offer more functionality.
Next to this there is also a YODA server at WUR, that is maintained by SURF. Check here for more information on YODA at WUR.
| Feature | Upload service by IT | Self uploading in generic irods | Private irods instance | YODA@WUR |
|---|---|---|---|---|
| Uploading | X | X | X | X |
| Tape Archiving | X | X | X | |
| irods Metadata | X | X | X | |
| finegrained file access control options | X | X | ||
| external access with SRAM | X | X | ||
| iRODS HTTP API | X | |||
| Other customisation options | X | |||
| Archived/Vaulted data protection | X | X | X | X |
| Storage at WUR | X | X | ||
| davrods | X | X | X | |
| Publication module (soon) | X | |||
| YODA json Metadata | X | |||
| YODA web GUI | X |
Tape archiving and data integrity
Tape storage service aims to be a long-term storage for data that is not used on a daily basis anymore. The data will be stored on tape drives in 2 WUR datacenters geographically separated on campus according to an agreed upon period of time. This service uses a combination of iRODS and Fujifilm tape storage to store the data, keep track of the ownership and manages the data transfers and data integrity. The data will be copied into iRODS, and from there it will be replicated 4 times (2 replications per data center) into the tape storage. iRODS will make sure that the checksum on the original medium is exactly the same as the checksum on the tape itself, keeping the data integrity intact.
The benefits of storing data on tape are (figures according to supplier):
- The data integrity on tapes is 1 read/write error per 1000 PB, which is up to 10.000X lower than on hard disks. Tapes are guaranteed by our supplier to last at least 50 years.
- Electricity consumption of tape storage is up to 100X lower than hard disks
- The tapes are decoupled from the network, meaning this data is protected better against malware and ransomware.
- Cost effective (See Price)
In team RDM we have developed a system that will ensure proper usage of tape and proper data integrity of the files. All files will be checked for checksums on tape, our local filesystem and your filesystem, ensuring that the data integrity is kept intact. As a user you can follow this process by looking at the metadata as is explained below. When the last step is reached you can choose to delete the file on your local system.
Price
The tape archive price is described in this article. This includes redundancy in two datacentres. There will be an option to pay every month, or to pay in advance for a period of maximum 10 years. For pricing of a private irods server contact us directly.
Technical process overiew
- User uploads file to the iRODS. The iRODS stores the file in the local storage: hot_1.
- User tags the file to archival. When done the archive status is archive_requested.
- A scheduler process copies the file to tapes: tape_1 and tape_2. When done the archive_status is archive_performed.
- A scheduler process does the data integrity check using checksums. When done the archive_status is archive_completed.
- A scheduler process cleans up our server so that the data is only on tape, and nowhere else in iRODS. When done is archive_status is completed_and_hot_deleted.
---
title: Archive status states
---
stateDiagram-v2
%% direction LR;
[*] --> archive_requested
archive_requested --> archive_performed
archive_performed --> archive_completed
archive_completed --> completed_and_hot_deleted
completed_and_hot_deleted --> [*]
WUR extra data protection features
We have implemented extra logic inside iRODS to protect against accidental deletes.
Deletion protection
From the moment a file is tagged to be archived, the file will get a deletion protection: it is possible to delete a file, but it is not possible to force delete a file. When a file is deleted, the file will be moved to the trash bin and keept there for some time before the real removal. During this time, it is possible to recover the file.
Tape archive limitations
The tape archive has a few limitations on files being stored there.
Allowed characters
Currently the tape archive supports the characters:
- Letters: lower case (a to z) and upper case (A to Z) from the english alphabet.
- Numbers: 0 to 9
- Special chars(the space char is in this list): !-_.*()+ ~@#$%^&{}|:<>?`=[];,"€
If you need extra characters to be included, please get in touch and we can see if those characters are possible.
File size
The file size of files being sent to tape cannot be larger than 5TB. This is a limitation per file, not the total storage space.
Bundling files
When sending files to tape it is wise to consider the usage of these files in the future. Will you be using files separately, or will you always download entire directories? Are you spending a lot of time to send all files separately, or will you send it in larger chucnks to save technical overhead? In any case it might be wise to use some sort of bundling of these files. Technically this is not needed, the tape archive and iRODS can handle small and large files.
If you do want to do bundling then you can either do bundling on your own servers/laptops, or you can let iRODS do the bundling for you. In the lattter case you can upload the files to iRODS and then tell iRODS to bundle and archive them. Documentation will follow.
Server maintenance
We have various test and production servers that we maintain. Certificates, security measures, attaching different kinds of storage and integrating iRODS with other systems is what we do in team RDM Infra. This will make sure that you can focus on using iRODS instead of worrying about maintaining it.