4. Risk Assessment

The TOAR database contains data from various networks of measuring stations, which are made available through an ecosystem of web services (Fig. 2.1) for evaluation purposes. In some cases, the data is sent from individual institutions directly to the TOAR data team and is processed according to a semi-automated workflow (see description in Automated Data Preparation). All raw data files, harmonised metadata and data are stored at Forschungszentrum Jülich. They are long-term archived and backed up. In order to keep the risk for this data and the database as low as possible, we have carried out an internal risk assessment. The data risks are evaluated on the basis of the risk assessment matrix developed by Matthew S. Mayernik et.al. 1.

In general, data in the TOAR database is available under CC-BY 4.0 licence 2. Personal data is not stored in the database. We take action against each individual risk factor and carry out a new risk assessment every two years.

In the following the main risk categories for the TOAR repository are given with the relevant risks factors including the estimated degree of control 3, the estimated impact on users, and the countermeasures taken proactively:

4.1. Data Risks

  • Lack of metadata & documentation (risk: high, impact: low)

    During data ingestion into the TOAR database the available metadata is checked for plausibility and missing metadata is generated where possible. Direct communication with data providers is used to clarify incomplete or ambiguous data submissions. Documentation of the database is thoroughly maintained by the TOAR database infrastructure team.

  • Data mislabelling (high, low)

    To avoid data misidentification, we conduct thorough testing of new software and new data submissions. The workflow design includes inspection of ingested data by the provider as well as use of the database in assessment reports (prompts many scientists to scrutinize the data when analysing it). Extensive documentation of errors and regular training of database administrators to sensitise them to these issues are performed.

  • Poor data governance (high, low)

    New staff members get an intensive training and a tight control of access rights and responsibilities is in place (only experienced personnel is allowed to curate data). An automation of curation processes is adopted where possible. The possible deterioration of data governance is minimized by retraining or replacement of personnel.

  • Accidental deletion (high, medium)

    Only very few skilled people have root privilege on the TOAR database infrastructure. System updates and major database updates are planned by at least two people. The infrastructure is set-up so that it rarely requires manual intervention. Restoring data from backup copies is possible.

  • Lack of planning (high, high)

    In order to avoid overloading a leader, measures for the transfer of responsibilities are established; written documentation and planning are important to us. Also regular team meetings to discuss issues, progress and plans are held.

  • File format obsolescence (medium, low)

    Regular user forums and user interaction will allow to foresee requests and code development can be planned early and will prevent file formats from becoming obsolete. Data are not stored as files but in a database, i.e. a conversion tool can easily be made available. Durable file formats have been chosen for the database.

  • Lack of provenance information (low, low)

    Clear rules for data submission including specific metadata attributes for provenance and direct contact with data providers are established as precaution.

  • Over-abundance (low, low)

    In this case direct communication with data providers is sought and careful planning of calls for data submission is done. A prioritisation of processing and curation is used to limit the impact of the delays. We expect such a situation to occur very rarely as the total expected data volume is quite well known.

4.2. Physical Risks

  • Media deterioration (very high, medium)

    To prevent the main danger from ageing of hard disks or tapes a comprehensive backup strategy including copies at a remote location (RWTH Aachen). All file systems are located on RAID (Redundant Array of Independent Disks) mode disks including those for the operational database and web services. The hardware is closely monitored and in case of hardware degradation the hardware is replaced. Should such failures lead to loss of data, it can be restored from backup.

  • Storage hardware breakdown (medium, medium)

    In order to counteract this risk, the following measures are taken: continuously monitoring of hardware components and hot-swap strategy as well as regular acquisition of new hardware and re-installation of database and services.

  • Cybersecurity breach (medium, medium)

    Elevated security measures are implemented at JSC and regularly reviewed and updated. As long-term HPC service provider there is strong awareness of cybersecurity issues. In case of a breach the TOAR data services are cordoned off. The TOAR database can be reinstalled from a trusted backup version in case of malicious attacks.

  • Bit rot and data corruption (medium, low)

    Our database technology contains safeguard measures for the TOAR database. Depending on severity of the problem, either individual values are fixed manually or long time series and sets of time series are re-inserted from backup copy or from raw data submissions.

  • Malicious attacks (low, high)

    Access to the hardware hosting the TOAR database infrastructure is regulated and only possible for designated personnel. Security measures include locked doors with entry system. Furthermore, the hardware is placed on a secured campus with manned gates.

  • Human error (low, medium)

    The hardware is operated by trained personnel with long experience in operating complex high performance computer architectures.

  • Catastrophes (low, medium)

    For the machine halls early warning systems are in place as well as a fire-extinguishing system (Argon) in the main one. A fully equipped fire brigade specialising in all types of fires is located at FZJ. The fire department is located about 650m away from the supercomputing centre.

    The open design of the TOAR database infrastructure and regular publicly posted database dumps allow for anyone to rebuild the TOAR database infrastructure upon loss. JSC as host of TOAR database infrastructure takes the specific precautions of supercomputing centres against loss of infrastructure and data in case of local-scale emergencies (fire, flooding, storm), including a comprehensive backup strategy with copies at a remote location. In case of partial loss, JSC staff will re-install the TOAR database and its web services on new or different hardware; in case of total destruction some other member of TOAR community would have to rebuild the TOAR database infrastructure from the automated backups and archived software stack.

4.3. Human Risks

  • Lack of use (medium, low)

    Regular user consultation, deep embedding into the TOAR phase 2 activity, proactive technology development, testing, testing, testing, thorough data quality control including user feedback is the strategy to face this risk. Even more communication with users and data providers on issues such as poor documentation, data quality perceived as unreliable, user-unfriendly interface(s), incomplete data, or convenience and speed of data availability compared to other sources.

  • Loss of knowledge around context or access (medium, high)

    Continuous training of personnel, building redundancy in terms of knowledge to operate the TOAR database infrastructure are in place as well as building up knowledge about the TOAR database infrastructure in the TOAR community. Software and data is archived or published so that the system can be rebuilt by trained personnel.

4.4. Organisational Risks

  • Loss of funding for archive (medium, high)

    Should there be a risk of termination in sight, the TOAR community will be alerted and a replacement will be sought actively. Furthermore, all software and data backups are made publicly available so that a new data centre can be launched with relatively little effort. Relocation of the TOAR database to another site is possible.

  • Legal status for ownership and use (low, medium)

    We only accept data under the condition that they are published with an open access license (CC-BY 4.0). There is no duration limit of this license, so it cannot be revoked after data has been published in the TOAR database. In most cases, the TOAR community will seek disclosure. Changes to rules for the release and use of data is very unlikely.

  • Political interference (low, high)

    The TOAR data centre and its achievements are openly communicated, so no specific measures are taken. In the extreme case, another TOAR partner is sought from another country who could take over the TOAR database infrastructure.