How do you adequately describe your research output

Author name: Anna Wałek

Introduction

Documentation and metadata are essential to understand your data in detail and help other researchers find and re-use your data. You can provide information about the dataset's content, the context in which the data were captured or created, and the origin of the dataset. Documentation and metadata are essential in making your data FAIR:

Metadata

Metadata is data about data. It plays an essential role in making your data FAIR. Metadata should be continuously added to your research data, not just at the beginning or end of a project. Metadata can be added manually or automatically, preferably according to a disciplinary standard. From a FAIR perspective, metadata is more important than your data because metadata should always be openly available and link research data and publications in the Internet of FAIR Data and Services. While data documentation is meant to be read and understood by humans, metadata (sometimes a part of the documentation) is primarily meant to be processed by machines.
There are three main types of metadata:

An excellent way to determine what metadata to capture is to think about the information someone else would need to understand your project and reuse the data. Many disciplines, repositories, or data centres use metadata standards or schemas. A metadata standard is a defined set of metadata fields that can be general or discipline-specific.

Metadata Standards

There are many metadata standards. General, field, and institutional standards can be specified. The general metadata standards are Dublin Core and DataCite, or the Data Documentation Initiative (DDI). They are universal in the field and widely used. Selected metadata standards are also used in various fields and institutions, e.g., DC (life sciences), EML (ecology), SDMX (ECB, EUROSTAT, IMF, OECD, UN), SAFE (ESA), INSPIRE ISO 19139 (earth sciences), Project Open Data Metadata Schema v1.1 (US federal agencies), TEI and CDWA (humanities disciplines). Metadata standards specific to various disciplines can be found on the Data Curation Centre website.

Documentation

Documentation is also needed to describe the data correctly. It includes contextual and descriptive features of the data and all the information someone needs to understand the data to be able to use it. It’s significant at the dataset level (e.g. describing how the data were created) but also at the level of individual data elements (e.g. explaining what each variable means or the parameters for generating data files such as images). Examples: protocols, codebook (with an explanation of concepts, names, variables, and abbreviations), lab journals, code explanation in the file, methodological information, and information about the structure of a dataset.

Readme

A README file is plain text with descriptive information commonly used for software, games, and code. It is a supplementary document so the creator can explain the contents to the user. When working with data, creating and including a README file with your data can be useful. This ensures that future users understand the data, terms, and more.

README files are a simple way to create documentation for a dataset,

The README file must be accessible at the exact location of the data. The README file's name must clarify which file or dataset it describes. The location of the README file in the folder structure can also show which dataset it belongs to. You can provide one README file for an entire research project.

It is essential to mention information on the context in which the data were gathered, the origin of the data, and the dataset's content. The person who opens the file has to understand what the dataset is about. You should also include any technical information needed to open a dataset, for example, the required software or specifications about file formats. To make it clear, you should explain the dataset's structure or what the abbreviations mean. Information about rights on the dataset or the confidentiality of data is best explicitly mentioned in the README file. The person opening the dataset must be aware of the legal implications of its use.

Written by Anna Wałek

Ph.D, President of IATUL, Open Science and RDM expert ACC Cyfronet AGH