Keeping Research Data Safe 2 - a JISC-funded Project

The identification of long-lived digital datasets for the purposes of cost analysis

Introduction

This web page has been set-up to support dissemination of information on the "Keeping Research Data Safe 2" project. The project aimed to extend previous work on digital preservation costs for research data. It identified long-lived datasets for the purpose of cost analysis and built on the work of the first "Keeping Research Data Safe" study completed in 2008.

The first Keeping Research Data Safe study funded by JISC made a major contribution to the study of preservation costs by developing a cost model and indentifying cost variables for preserving research data in UK universities. That work has had considerable impact and received international interest. Over 3,400 copies of the report were downloaded from the JISC website during 2008 alone making it JISC's most popular publication in 2008.

The Keeping Research Data Safe 2 project commenced on 31 March 2009 and the final report was published in May 2010. The project identified and analysed sources of long-lived data and developed longitudinal data on associated preservation costs and benefits. We believe these outcomes will be critical to developing preservation costing tools and cost benefit analyses for justifying and sustaining major investments in repositories and data curation.

For further information see details below and our final report. Regular updates on related projects and future implementations of KRDS2 will be posted to the Charles Beagrie Blog.

Background

Data has always been fundamental to many areas of research but it in recent years it has become central to more disciplines and inter-disciplinary projects and grown substantially in scale and complexity. There is increasing awareness of its strategic importance as a resource in addressing modern global challenges and the possibilities being unlocked by rapid technological advances and their application in research. However, there are several significant challenges facing the academic community relating to the long-term curation, storage, retrieval and discovery of research data. Recognising this, research funders have begun investing heavily in data infrastructure initiatives and developing support for digital repositories and preservation. We believe identifying and developing longitudinal data on preservation costs and benefits associated long-lived data collections is critical in justifying and sustaining this work and for forwarding planning and effective resource allocation.

The Project Team

The project was undertaken by a consortium consisting of 4 partners involved in the original Keeping Research Data Safe study (University of Cambridge, Charles Beagrie Ltd, OCLC Research, and University of Southampton) and 4 new partners (Archaeology Data Service, University of Oxford, UK Data Archive, and University of London Computer Centre) with significant data collections and interests in preservation costs. All the partners brought considerable relevant expertise, knowledge and resources to the project.

Project Plan

Our project plan can be downloaded here.

Review and Updates to Activity Model

All of our project partners undertook a detailed review of the activity model published in Keeping Research Data Safe1.

The overall finding from this review was that the KRDS1 Activity Model was robust and broadly a good fit to their activities. Some changes were suggested for use in KRDS2, mainly to the wordings of definitions and edits to the existing text. In addition, three substantive changes or additions to activities were also identified by two or more reviewers and agreed as changes to the KRDS1 model for KRDS2:

  • The need to divide the "outreach and depositor support" sub-activity under Acquisition in the Archive phase.
  • The need to divide the development of the archive's Selection Policy and its application within the selection sub-activity of Acquisition.
  • The need to cover staff training and development as a specific activity.

The revised KRDS2 activity model is available to download in two versions (note guidance on the use of the activity model is available in the first Keeping Research Data Safe report):

The Data Survey

We used our desk research and input from the project partners to prepare selection criteria for identifying appropriate sources of information to feed into our data survey and then selected sources for further analytical work. Our selection criteria and definition of scope for research data for this may be downloaded here.

We prepared a survey proforma to identify key research data collections with information on preservation costs and issues. Between September and November 2009 we made an open invitation via email lists and the project blog and this project webpage for others to contact us and contribute to the data survey if they had research datasets and associated cost information that they believe may be of interest to the study. We incorporated within our project, partners who also contributed. Completed responses to the data survey are provided below together with a short summary and analysis.

Completed Data Survey Responses

Overview

Summary Analysis of Data Survey Responses

UK Responses

Archaeology Data Service
British Atmospheric Data Service - Natural Environment Research Council
eCrystals (Chemical Crystallography) - University of Southampton
National Library of Wales
Rutherford Appleton Laboratory - Science and Technology Facilities Council
University of Oxford
UKBorders - University of Edinburgh
UK Data Archive - University of Essex
Linnean Society Collections Online - ULCC
National Digital Archive of Datasets (NDAD) - ULCC
Visual Arts Data Service (VADS) - University for the Creative Arts

International Responses

Germany - Bavarian State Library
Netherlands - Data Archiving and Networked Services (DANS)

Our Final Report

Our final report was published by JISC in May 2010 and is available for download as a pdf file. The KRDS2 final report presents the results of the survey of available cost information, validation and further development of the KRDS activity cost model, a new taxonomy to help assess benefits alongside costs, and six case studies illustrating costs and benefits. Supplementary materials that were not included in detail in the final report (including individual responses to the data survey listed above) are available on this project webpage.

Other Supplementary Files for the KRDS2 Final Report

ULCC Excel Cost Spreadsheet for the NDAD service 

Guide to Interpreting and Using the NDAD Cost Spreadsheet

National Crystallography Service (NCS) Benefits Study Supplementary Material