Archive and publish data
Data must not only be stored in the work process, but should also be archived at an appropriate time in the sense of good scientific practice. For example, the DFG and the University of Kassel require that research data be stored for at least 10 years.
This function can be performed by repositories in particular. These also offer the possibility of publishing data.
We recommend storing and, if necessary, publishing the data in a subject-specific repository. A targeted search for subject-specific repositories is offered by the RepositoryFinder.
The University of Kassel provides all researchers who cannot or do not wish to use a subject-specific repository with an institutional re pository (DaKS), which fulfills the function of both archiving and publication. This can also be used for student projects and theses.
Publishing your data offers advantages for the scientific system, but also for you personally.
Published data are available for subsequent use in new contexts, e.g. also for interdisciplinary questions or meta-analyses. This not only creates scientific added value, but also avoids duplication of work and saves costs.
By assigning permanent identifiers, your data can be permanently referenced and cited by yourself and others. This is a prerequisite for data publications to be recognized as an independent achievement and to enter the scientific reputation system . A study by Piwowar and Vision (2013) also shows the higher citation rate of publications where the underlying research data have been published.
Last but not least, in some cases the publication simply fulfills requirements of third parties. In addition to the requirements ofresearch funders, publication service providers are also increasingly demanding that those research data on which a publication is based be made available. Some examples of such requirements are:
- Public Library of Science (PLOS): Data Availability Policy / Materials and Software Sharing Policy
- Nature Publishing Group: Availability of Data, Material and Methods Policy
- Science: Data and Materials Availability Policy / Preparing Your Supplementary Materials
- BioMed Central: Availability of supporting data
- Elsevier: Research data Policy and Text and Data Mining Policy
There are both subject-specific or thematic as well as generic repositories. Subject repositories and data centers (such as Pangaea for geoscientific data, GenBank, Protein Data Bank) are often the first choice, not least with regard to visibility in the subject community, but also with regard to conformity to subject-specific standards . An overview of subject repositories is provided by the Registry of research data repositories(re3data.org) and the Open Access Directory to research data. A targeted search for subject repositories that also allow data storage is offered by the re3data-based RepositoryFinder.
When deciding on a particular repository, the following points can help you:
- Is it a repository that fits the subject matter? Is it established and connected to specific search portals?
- Does the repository offer the desired services (PIDs, open access, differentiated access rights (e.g. user agreements), realization of embargo periods)?
- Is the sustainability of the repository guaranteed? Is there an exit strategy or an agreement to preserve the data in case of e.g. discontinuation of funding?
- How are data transfer and data use regulated in terms of content and form?
The University of Kassel also provides all researchers who cannot or do not wish to use a subject-specific repository with an institutional repository (DaKS) (expected to be available from mid-January 2021), which fulfills both archiving and publication functions (see also "Archiving and publishing data"). This repository can also be used for student projects and theses.
In addition, interdisciplinary repositories for research data are available, such as the EU-funded ZENODO, Dryad or figshare.
Uploading your data does not equate to open access. In principle, you can also publish research data with a delay or only make the metadata accessible. In the case of actual publication, you can regulate the rights to access and edit in detail via the license or contracts (Can I then control the use of my data at all?). These possibilities can essentially be limited by:
- the specific requirements and policies of your research funders and/or publishers
- lack of/limited rights to the data
- restrictions under data protection law
- restrictions on the part of the repository
There are constellations in which data should not be published or should only be published under certain conditions. The most important prerequisite for publication is that you have the right to do so (Who may decide on the disclosure and publication of data? DoI own the copyright to my data?).
On the other hand, it may be confidential, personal data that may only be published after anonymization or with the consent of the persons concerned (What data protection restrictions must I observe?).
Metadata is used to describe resources, in this case research data, in order to optimize their discoverability. Basic information includes, for example, title, author/primary researcher, institution, identifier, location & time period, subject, rights, file names, formats, etc. Since this information is essential for finding, understanding, and using data, standardized metadata schemas are intended to ensure that descriptions are as uniform and comprehensible as possible.
Metadata schemas are compilations of elements for describing data. Some disciplines already have specific metadata schemas, such as
- Humanities: Text Encoding Intitiative (TEI)
- Earth Sciences: ISO 19115, Darwin Core
- Natural Sciences: ICAT schema, Cristallographic Information Framework, conventions for Climate and Forecast metadata.
- Social and economic sciences: Digital Documentation Initiative (DDI)
Before you start documenting your data, ideally already as part of a data management plan, you should therefore check whether a suitable metadata schema already exists for your discipline. Information on this is provided, for example, by the Digital Curation Center (DDC). If no discipline-specific schema is available, a discipline-independent one, such as Dublin Core, MARC21 or RADAR. can also be used.
Metadata schemas thus specify what information should be delivered. For the best possible search and use of the data, it is also important to provide this information in as uniform a format as possible. A number of discipline-specific and cross-discipline so-called 'controlled vocabularies', thesauri, classifications and standards data are available for this purpose, such as:
- Standards for unique identification of individuals such as Open Researcher and Contributor ID (ORCID) or International Standard Name Identifier (ISNI, ISO 27729).
- Subject classification systems (e.g. DDC or LCC)
- Subject-specific classifications such as the Mathematics Subject Classification (MSC) or the Social Sciences Classification.
- Subject-specific thesauri such as the Thesaurus of Social Sciences (TheSoz), the Standard Thesaurus of Economics (STW) or the Getty Vocabularies (AAT, TGB, CONA, ULAN).
An overview of different systems is provided, for example, by the Basel Register of Thesauri, Ontologies & Classifications (BARTOC) and Taxonomy Warehouse.
Documentation usually goes beyond the description of data via metadata. It represents a deeper (scientific) indexing, in the context of which e.g. context of origin, variables, instruments, methods etc. are described in detail. In many cases, such a description is indispensable for understanding, verifying and, if necessary, using the data.
Introductions to the topic of metadata are provided, for example, by the JISC Guide or the interactive Mantra course of the University of Edinburgh.
Unless otherwise noted, all texts on this site and its subpages are licensed under a Creative Commons Attribution 4.0 International License.
