FAQ – frequently asked questions

Sources of information on research data management

Are you interested in current information about the FDM in Kassel, events, services, resolutions, etc.? We have a few helpful sources of information for you:

Introduction and background

Research data is all data that is generated, processed or used in the course of a scientific process or is the result of such a process. Depending on the scientific discipline, research data can be available in different formats.

Research data management is the process in which the generation, management and backup of this data is described or planned. It encompasses all areas of data management, in particular the planning of data collection, the generation and preparation of data, data integrity, its documentation and sustainable storage, as well as making the data accessible. This process is developed and documented with the help of a data management plan, which is or should be part of every research project.

The data management plan is a "living document" that initially represents the central planning tool for data management in the research project and develops into a project documentation tool over the course of the project.

Effective Research Data Management (RDM) is the foundation of trustworthy, transparent, and sustainable research. It helps researchers ensure that their results remain reproducible, secure, and reusable in the long term.

RDM helps to…

  • Prevent data loss: Regular backups, redundant storage locations, and the use of suitable file formats keep your work safe over time.
  • Avoid misuse or misinterpretation: Structured documentation and clear access policies maintain data integrity and accountability.
  • Enable transparency and reuse: Comprehensive documentation and meaningful metadata ensure that others can correctly understand and potentially reuse your data.

Systematic RDM starts already in the planning phase of a research project. Defining early how data will be collected, described, stored, and shared leads to more efficient workflows, better findability, and greater potential for future collaborations.

Moreover, RDM is increasingly required: research funders, universities, and journals often demand data management plans or evidence of data availability. Therefore, make sure to check the specific requirements of your discipline, institution, or funding body:

Research data guidelines of the University of Kassel

Code of Conduct: Guidelines for Safeguarding Good Research Practice (Status: September 2024 / revised version 1.2)

 

Structure, formats and documentation

In the work process, not only a large number of data sets are often created, but also respective versions due to various modification stages . With a view to efficient work, coordinated collaborative work processes, long-term traceability and, if necessary, internal or external reusability, it is advisable to define specific conventions for naming and versioning data records. If necessary, it may also make sense to define additional folder structures according to the degree of processing. The conventions should in turn be documented.

Naming conventions may look very different depending on the specifics of the research areas and data. They should reflect what type of data files (original data / raw data, cleaned files, analysis files) or what file form (working file, results file, etc.) are involved. This differentiation can also be done via versioning conventions. Uniformity, unambiguity and meaningfulness are important .

Examples for meaningful file naming are for example:

  • [sediment]_[sample]_[instrument]_[YYYYMMDD].dat
  • [experiment]_[reagent]_[instrument]_[YYYYMMDD].csv
  • [experiment]_[experimental design]_[subject]_[YYYYMMDD].sav
  • [observation]_[location]_[YYYYMMDD].mp4
  • [interviewee]_[interviewer] ]_[YYYYMMDD].mp3

To ensure compatibility between different operating systems, special characters (except underscores and hyphens) and umlauts should be avoided. 

Read-onlyversions should be created at various stages of modification (e.g., original data, cleaned data, analysis-ready data). Further edits should only be made to copies of these master files.

A well-known concept of versioning , based on the Data Documentation Initiative ( DDI) standard, is:

Starting from version "v1-0-0", the following changes are made:

1. the first digit, if multiple cases, variables, waves or sample have been added or deleted

2. the second digit, if data are corrected so that the analysis is affected

3. the third digit, when simple revisions are made without relevance to meaning.

Conventions should always be adapted to the subject or project specific needs. If, for example, versions are not in a linear relationship to each other, relationships can be defined via special metadata schemas (such as the DataCite Metadata Schema) ("IsDerivedBy", "IsSourceOf")

Versioning can also be supported by appropriate software (e.g. Git).

A data publication unfolds its full potential when the research data is published in file formats that, on the one hand, allow the most unrestricted possible subsequent use and, on the other hand, are as stable as possible over the long term. From both points of view, open, non-proprietary file formats are recommended: Open file formats can be read by different programs and are not dependent on a specific software, so that subsequent use in different IT environments (such as operating systems, software licenses, common programs) is possible.

With this in mind, this overview of suitable file formats recommends specific file formats for data publication.

Metadata is used to describe resources, in this case research data, in order to optimize their discoverability. Basic information includes, for example, title, author/primary researcher, institution, identifier, location & time period, subject, rights, file names, formats, etc. Since this information is essential for finding, understanding, and using data, standardized metadata schemas are intended to ensure that descriptions are as uniform and comprehensible as possible.

Metadata schemas are compilations of elements for describing data. Some disciplines already have specific metadata schemas, such as

Before you start documenting your data, ideally already as part of a data management plan, you should therefore check whether a suitable metadata schema already exists for your discipline. Information on this is provided, for example, by the Digital Curation Center (DDC). If no discipline-specific schema is available, a discipline-independent one, such as Dublin Core, MARC21 or RADAR. can also be used.

Metadata schemas thus specify what information should be delivered. For the best possible search and use of the data, it is also important to provide this information in as uniform a format as possible. A number of discipline-specific and cross-discipline so-called 'controlled vocabularies', thesauri, classifications and standards data are available for this purpose, such as:

An overview of different systems is provided, for example, by the Basel Register of Thesauri, Ontologies & Classifications (BARTOC) and Taxonomy Warehouse.

Documentation usually goes beyond the description of data via metadata. It represents a deeper (scientific) indexing, in the context of which e.g. context of origin, variables, instruments, methods etc. are described in detail. In many cases, such a description is indispensable for understanding, verifying and, if necessary, using the data.

Introductions to the topic of metadata are provided, for example, by the JISC Guide or the interactive Mantra course of the University of Edinburgh.

Data publication

Publishing your data offers advantages for the scientific system, but also for you personally.

Published data are available for subsequent use in new contexts, e.g. also for interdisciplinary questions or meta-analyses. This not only creates scientific added value, but also avoids duplication of work and saves costs.

By assigning permanent identifiers, your data can be permanently referenced and cited by yourself and others. This is a prerequisite for data publications to be recognized as an independent achievement and to enter the scientific reputation system . A study by Piwowar and Vision (2013) also shows the higher citation rate of publications where the underlying research data have been published.

Last but not least, in some cases the publication simply fulfills requirements of third parties. In addition to the requirements ofresearch funders, publication service providers are also increasingly demanding that those research data on which a publication is based be made available. Some examples of such requirements are:

Publishing your data can be done in different ways:

  • Discipline-specific data repositories and centers (How do I find a suitable repository?). This usually represents the best solution.
  • Cross-discipline repositories such as Zenodo, Dryad or figshare (a comparison of the three repositories can be found here). This is more of a medium-term solution, as long-term archiving is not guaranteed. Across disciplines, the repository of the University of Kassel is also available (see "Archiving and publishing data")
  • Data supplements of journals, e.g. Nature. This is increasingly required, but should be complemented by other archiving strategies in view of long-term availability.

Data journals such as GigaScience, Earth System Science Data, or Journal of Chemical and Engineering Data (lists of Data Journals #1, #2) do not publish data themselves, but their description-not interpretation (documentation or data curation profiles). This takes into account not least the fact that traditional articles offer hardly any space for the - important and valuable - data description.

There are both subject-specific or thematic as well as generic repositories. Subject repositories and data centers (such as Pangaea for geoscientific data, GenBank, Protein Data Bank) are often the first choice, not least with regard to visibility in the subject community, but also with regard to conformity to subject-specific standards . An overview of subject repositories is provided by the Registry of research data repositories(re3data.org) and the Open Access Directory to research data. A targeted search for subject repositories that also allow data storage is offered by the re3data-based RepositoryFinder.

When deciding on a particular repository, the following points can help you:

  • Is it a repository that fits the subject matter? Is it established and connected to specific search portals?
  • Does the repository offer the desired services (PIDs, open access, differentiated access rights (e.g. user agreements), realization of embargo periods)?
  • Is the sustainability of the repository guaranteed? Is there an exit strategy or an agreement to preserve the data in case of e.g. discontinuation of funding?
  • How are data transfer and data use regulated in terms of content and form?

The University of Kassel also provides all researchers who cannot or do not wish to use a subject-specific repository with an institutional repository (DaKS) (expected to be available from mid-January 2021), which fulfills both archiving and publication functions (see also "Archiving and publishing data"). This repository can also be used for student projects and theses.

In addition, interdisciplinary repositories for research data are available, such as the EU-funded ZENODO, Dryad or figshare.

    First of all, it is important that the data is available in a suitable format . Some repositories make stricter specifications here, others merely make recommendations or are open to all formats. This makes it all the more important to start thinking about this in advance of the research. For general advice and specific links on formats, seeWhat file formats are useful?

    In order for data to be found and used in a meaningful way, it must be documented in more detail through metadata . Please refer to the detailed notes atWhat are metadata, metadata schemas, and documentation?

    An upload to a repository does not automatically mean immediate publication. Under certain circumstances, there may be reasons for an embargo period or partial publication. Especially in business-related research disciplines, embargoes on research results are common. Therefore, consider whether there are weighty reasons against immediate publication. See on this Does anything speak against publication?

    Also consider the conditions under which you want to publish your data. There are different license models for this (Which license should I choose?)

    In many repositories, your datasets undergo curation before they are published or accepted in the repository. Among other things, the above-mentioned aspects are checked and you may receive suggestions for improvement or guidelines. The publication ‘Curation recommendations for HeFDI repositories’ provides an overview of the curation in DaKS. 

    Uploading your data does not equate to open access. In principle, you can also publish research data with a delay or only make the metadata accessible. In the case of actual publication, you can regulate the rights to access and edit in detail via the license or contracts (Can I then control the use of my data at all?). These possibilities can essentially be limited by:

    • the specific requirements and policies of your research funders and/or publishers
    • lack of/limited rights to the data
    • restrictions under data protection law
    • restrictions on the part of the repository

    There are constellations in which data should not be published or should only be published under certain conditions. The most important prerequisite for publication is that you have the right to do so (Who may decide on the disclosure and publication of data? DoI own the copyright to my data?).

    On the other hand, it may be confidential, personal data that may only be published after anonymization or with the consent of the persons concerned (What data protection restrictions must I observe?).

    Legal aspects

    Legal questions help to avoid misunderstandings and conflicts. Clarifying at an early stage how data may be handled, for example what may be published and in what form, saves a lot of effort later on. Special attention must be paid to data protection, as compliance with the GDPR is essential.

    Copyright law and the General Data Protection Regulation (GDPR) apply in particular, but many other areas may be affected in individual cases (patent law, the Nagoya Protocol, licensing rights, etc.). In addition, there may be rules from university or funding agreements, as well as cooperation agreements. These laws and contractual provisions determine who may use or pass on which data. If you are unsure, the Research Data Service, the data protection officers or the university's legal staff can help.

    General Data Protection Regulation (GDPR)
    Legal specialists at the University of Kassel

    Personal data is information that relates to a person, e.g. name, voice, address and entire interviews or group observations. Such data may only be processed under certain conditions. There must be a legal basis, usually in the form of informed consent. As a rule, they must be anonymized or pseudonymized before being passed on. In the case of so-called special categories of personal data (e.g. ethnic origin, political convictions, biological samples) or if vulnerable groups such as schoolchildren are affected, the ethics committee must also be involved. In addition, a so-called register of processing activities must always be drawn up, which also takes into account the technical security of the data. The university's data protection officers provide support here.

    Data protection officers of the University of Kassel
    Ethics committees of the University of Kassel

    Possible owners or co-owners of the rights to the data are the researchers, the employer, the client, research funders and/or (private sector) contractual partners. The contractual relationship determines who may or must be consulted about the disclosure or publication of research data. Normally, the results of research that is subject to instructions are the property of the employer or funder. The situation is different for in-house research, where researchers are allowed to decide on their own data. There are also exceptions where it is not possible to decide freely, e.g. personal data, data subject to export controls, business secrets or locations of endangered species.

    Research objects and occasionally also research data can be protected as works within the meaning of copyright law. These can be linguistic works, computer programs, musical works, pantomime works including works of dance art, works of fine art including works of architecture and applied art, photographic works, cinematographic works and representations of a scientific and technical nature.

    As a rule, however, research data lack the necessary level of creativity and are not works. However, it is possible that certain types of research data fall under an ancillary copyright, for example photographs, motion pictures or sound recordings.

    However, the research data of a research project are often protected by copyright as part of a database work or fall under the ancillary copyright for databases.

    Research data that is not covered by an intellectual property right can generally be used by anyone for any purpose without permission or payment obligation.

    It is important to note that there is still an obligation to disclose the origin and to cite the data correctly.

    First of all, regardless of whether copyrights or other rights to the data exist at all, it is the duty of every person who reuses the data of others to transparently explain the origin of the data and to cite you accordingly in accordance with good scientific practice.

    In addition, if you have a copyright or ancillary copyright to research data, you can regulate various aspects of use by means of appropriate contracts, such as the type and manner of use, user groups and time period, purpose, etc. As contractual regulations for individual cases would be very time-consuming in practice, there are various solutions for standardized regulations of usage rights. For example, the Leibniz Center for Psychological Information and Documentation (ZPID) offers standard contracts for the use of psychological data and GESIS regulates access restrictions for particularly sensitive social science data via usage contracts. If your data should not be subject to any specific access or usage restrictions, it is advisable to use standardized licenses such as Creative Commons or Open Data Commons.

    The publication of data under a specific license allows a detailed definition of the permitted form of its use. They create legal certainty for both the person providing the data and the person using it. It is therefore also important to formulate a waiver of any restrictions.

    Although data itself is generally not subject to copyright, there is a case for treating it as potentially worthy of protection, not least in order to express one's own ideas on further use. Various license models are available for this purpose. The most common of these is 'Creative Commons' (CC). CC licenses are independent of the licensed content and cover copyrights, ancillary copyrights and, in the current version - if available - also database producer rights.

    Irrespective of its legal binding nature, the CC-BY license certainly comes closest to fulfilling the idea of open access and open science, whereas the 'redistribution under the same conditions' can lead to compatibility problems with other licenses, the prohibition of editing can lead to restrictions on use by e.g. data mining or to problems with long-term archiving. The prohibition of commercial use makes it more difficult to use in commercial databases and thus potentially reduces the visibility of your research (for details see Paul Klimpel, 2012 [German text]).

    Whichever license you choose, you should make a conscious and informed decision. Tools such as the English-language Licensechooser for CC licenses or the one for software are helpful here.
    Irrespective of the terms of use, the rules of good scientific practice naturally apply, which require the source of the data used to be stated.

    Finding and using research data

    Research data is increasingly available for re-use, not least due to the requirements and recommendations of funding bodies, publishers and institutions for making data accessible. In order to find suitable research data for your own field of research, the first port of call is often relevant offerings from your own specialist area. These can be institutional or subject-specific repositories or data journals (↗ How can I publish data?). You can search for repositories - broken down by subject area - using the Repository Finder. A list of data journals - by no means exhaustive - can be found in the wiki forschungsdaten.org (↗ Data Journals).

    It is also possible to search for data across several repositories using generic search services . A major disadvantage of these search services is that they often cannot adequately map the detailed metadata schemas of their sources. In addition, the respective metadata differ greatly in terms of what they identify, i.e. individual data, data sets or collections.

    The best-known portals include

    • BASE - Bielefeld Academic Search Engine - Retrieves metadata from repositories and databases via OAI-PMH. Research data can be found via the document type "Primary data".
    • B2FIND - Searches metadata from various sources such as CLARIN or Global GBIF.
    • DataCite Metadata Search - Searches metadata of information objects, including research data (object type 'Dataset'), which are registered with DOIs at DataCite. Some of the metadata is also queried by the other two services (BASE, B2FIND).
    • OpenAIRE - Contains freely accessible research results from EU-funded projects.
    • Google Dataset Search (proprietary!)
    • gesisDataSearch - Search for data on social and economic research in data repositories and metadata services.
    • VerbundFDB - Forschungsdaten Bildung - Search for studies, research data and instruments of empirical educational research.

    The respective rights (licenses, user contracts if applicable) are binding for the subsequent use itself. Among other things, you can determine who may use the data, for what purpose and for how long (↗ legal aspects).

    In order to be able to reuse research data, the quality of the data is crucial. Data quality in research data management includes the following areas in particular:

    • Data format (special storage formats of scientific data, such as vector format, raster format and property format, etc.)
    • Data completeness and data accuracy

    A free prototype of the Leibniz Data Manager is currently available, which serves as an example of similar tools:

    Leibniz Data Manager enables the visualization of different research data formats, allowing the 'screening' of datasets for their potential usefulness. As a visualization and management tool, it supports the administration and access to heterogeneous research data publications, and thus helps researchers to select relevant datasets for their respective disciplines.

    In order to adequately document the (re-)use of own and third-party research data in accordance with good scientific practice, correct data citation is essential.

    In the case of third-party data, this also acknowledges the scientific achievements of their 'authors'. As with the citation of other publications, the conventions for citing data may differ formally. In terms of content, however, they are united by the requirement that the data source be clearly identifiable. The FORCE11 Data Citation Synthesis Group has developed recommendations for data citation. According to them, a complete data citation comprises

    Author(s), year, title of the research data, data repository or archive, version, worldwide persistent identifier

    Other optional information that can be useful in the context of a citation are edition, feature name and Uniform Resource Identifier (URI), resource type, publisher, Unique Numeric Fingerprint (UNF) and location (see Alex Ball & Monica Duke (2015). How to Cite Datasets and Link to Publications).

    Unless otherwise noted, all text on this site and its subpages is licensed under a Creative Commons Attribution 4.0 International License.