E - 1 Introduction to data publication

Licenced under CC-BY 4.0 and OSI-approved licenses, see licensing.

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • Why submit my data to a repository?

  • What types of repositories are there?

  • How do I find a suitable repository?

Objectives
  • Explain why data should be publicly available.

  • Explain different types of repositories and how to find a suitable one.

Why submit your datasets to a repository?

Why should I share my data?

  • Open Science & FAIR - To meet the requirements from funders and society on Open Science & FAIR
  • Reproducibility - So that your published research results can be reproduced
  • Trail of evidence - To provide a provenance of the data
  • 3rd party access - To give others access to your data
  • Archival purposes - Research data should be available for as long as it is useful to someone
  • Publication of paper requires it - Nowadays most publishers require you to submit the data to a repository when publishing a paper

FAIR data

Data publication is the best way to make your research projects FAIR since your data becomes:

  • Findable by being assigned a persistent identifier, and by being described with rich metadata.
  • Accessible by being put in a resourse that is searchable, and enables easy access via internet
  • Interoperable by using standard format and language to represent both the data and its metadata
  • Reusable by fulfilling the F, A, and I, and by having a clear and accessible data usage license

Repositories provides the technical solution to FAIR data. Hence, by submitting data to a repository, your data becomes FAIR and you don’t have to provide a solution on your own.

Note that, while we focus on life science research data, the same principles apply to any other type of research data.

What data should be submitted?

  • Raw data: this is the data that comes straight from the instrument, e.g. RNA sequences in fastq format
  • Processed & analysis data: this is the data where some type of analysis or processing has been done, e.g. normalization, removal of outliers, expression measurements, statistics
  • Metadata: this is the description of the raw and processed data, e.g. in the form of minimum information to reproduce the data, sample information, precise protocols

How to find a suitable repository

Types of repositories

How find a domain-specific repository?

EBI Repository Wizard

EBI Links to an external site. hosts several life science repositories, suitable for different types of data. The Repository Wizard helps to identify which one is suitable for your data.