Long-Term Preservation

As explained in Data Sharing & Preservation, IA2 provides storage for “Data Sharing” and “Long-Term Preservation”. The latter refers to save data on Tape Library.

 

 

How the Tape Library works

  • Once data have been written to the Tape Library they can no longer be deleted or modified.

  • Access to the data written on the Tape Library occurs asynchronously via VOSpace.

 

How to request an account

  • The account is personal and not open to a group. If it is necessary to save data of a group this must be done through an account linked to a contact person.

  • The account must be requested directly from Questo indirizzo email è protetto dagli spambots. È necessario abilitare JavaScript per vederlo. and will use the IDEM credentials.

  • The request must ... indicate the size of the data to be saved, and the frequency with which you plan to access this data.

 

How to write a Data Management Plan

The Data Management Plan must provide all information deemed relevant for proper data preservation following the FAIR principles.

You can follow these guidelines to write your DMP.

  • Requested long-term storage space and technical justification:

    Simply, how much space do you need and why? What is the scientific and technical rationale for the space required? Please remember that data once written on the Tape library cannot be deleted or modified. So you need to be sure that you really want to store and preserve them, i.e. that your data are not data, however voluminous, that can be easily regenerated or merely worthless. It may be that only a subset of your data requires long-term preservation, not all of them. Weigh the cost of generating them again against the opportunity to keep them on a long-term storage.

  • Type and format of data and description of the structure of the data collection: 

    What type of data do you want to store (e.g. simulations, observations, etc.), what file formats (e.g. fits, hdf5, tar, etc.), how big are these files? Are they compressed or not? How are the data and the corresponding metadata structured? How are data and metadata organized into files and folders? Provide any useful information to understand the type of data and metadata you want to store and how they are organised.

  • Expected access frequency:

    How often will the data be accessed? Both in terms of how much data will be accessed or downloaded at one time, and how often the data will be accessed over time. It is possible to differentiate by type of data, given that some data will be accessed more frequently, others less frequently.

  • Access policy:

    Who has access to the data? Just the contact person of the account or a group of people? Do we need to take into account restrictive authorization rules for accessing data?

  • Do you plan to public your data?: 

    It is possible that the data are private now, but they could be made available for public access in the future. If your data is for private access only, answer "No". Otherwise, provide any useful information to make your data public, according to the FAIR principles, and how you plan to do so. Who will be the reference community if your data are public? Will all your data be public or will only a subset be published? Will there be a financing plan to support this activity?

  • Additional information:

    Provide us with any further information that you deem useful and necessary to allow us to best store your data.

     

 

How to ingest data

  • Once the account has been created, the user will be allocated a scratch area where she/he can transfer her/his data;

  • The transfer can be done via scp, rsync. Any additional software should be agreed directly with the IA2 team;

  • Once the transfer of a block of data (directory with sub-directory) is completed by the user, this will be taken over (after communication by the user) and frozen;

  • We kindly ask to prepare and structure the data in directories with no more than 2000 units, or bear in mind that, during the storage procedure, leaf directories containing more than 2000 files will be converted into .tar files. You can use this Bash script to perform a check on your files before uploading them. For each directory we ask to produce a file with a list of computed checksum of the related files in order to check the file integrity. Please leave each checksum file into the directory.

  • Note: if in the directories there are too many files (exceeding 2000 files per directory) this can affect both the efficiency of the file system and data transfer. It is therefore suggested to proceed with a creation of a tar before to transfer it;

  • Here a checksum script to calculate the file’s checksum as expected. Before starting this process, please check the files or directories do not contain special character like:

    [ ] < > ? \ / " : | ' ` *


    Suppose to have your data into /home/<yourname.surname>/<my_data>.

    Put the script into /home/<yourname.surname> and run it changing to execute the grants. 

    cd /home/<yourname.surname>
    chmod +x checksum
    bash checksum my_data &

     

    This file will run in background and it will create a log file. To check the script execution state, use :

    pgrep -f checksum

     

  • Once  data has been save on the tape, they can manage by the user in "read-only" mode;

 

Data Retrieval

  • Data from the Tape can be retrieved via the VOSpace. From the VOSpace interface the user can async-recall both individual files and folders. Once these files or folders have been recalled, download links appear on the VOSpace interface. For small files, the user can download the files directly from the VOSpace interface. For big files, a large number of files or entire folders we suggest downloading the files with scp or rsync. In any case, the user must async-recall the data to download.

  • The requested data will be placed in the user's directory with the original path.

 

Note

  • The user does not have direct access to write on the Tape Library.

 

FAQ

  • Q. How can I calculate the checksums?
    A. You can calculate the checksums with this script;

  • Q. How long will the files be available online?
    A. It depends on the availability of fast disk space and the requested amount of storage; in general it is about one month

  • Q. Is there a fee to pay?
    A. Yes, if the amount of data to import is over 250 TB on Tape and the cost is about the cartridge cost at TeraByte.