Overview
Teaching: 15 min
Exercises: 0 minQuestions
Why submit my data to a repository?
What types of repositories are there?
How do I find a suitable repository?
Objectives
Explain why data should be publicly available.
Explain different types of repositories and how to find a suitable one.
Why submit your datasets to a repository?
Why should I share my data?
- Open Science & FAIR - To meet the requirements from funders and society on Open Science & FAIR
- Reproducibility - So that your published research results can be reproduced
- Trail of evidence - To provide a provenance of the data
- 3rd party access - To give others access to your data
- Archival purposes - Research data should be available for as long as it is useful to someone
- Publication of paper requires it - Nowadays most publishers require you to submit the data to a repository when publishing a paper
FAIR data
Data publication is the best way to make your research projects FAIR since your data becomes:
- Findable by being assigned a persistent identifier, and by being described with rich metadata.
- Accessible by being put in a resourse that is searchable, and enables easy access via internet
- Interoperable by using standard format and language to represent both the data and its metadata
- Reusable by fulfilling the F, A, and I, and by having a clear and accessible data usage license
Repositories provides the technical solution to FAIR data. Hence, by submitting data to a repository, your data becomes FAIR and you don’t have to provide a solution on your own.
Note that, while we focus on life science research data, the same principles apply to any other type of research data.
What data should be submitted?
- Raw data: this is the data that comes straight from the instrument, e.g. RNA sequences in fastq format
- Processed & analysis data: this is the data where some type of analysis or processing has been done, e.g. normalization, removal of outliers, expression measurements, statistics
- Metadata: this is the description of the raw and processed data, e.g. in the form of minimum information to reproduce the data, sample information, precise protocols
How to find a suitable repository
Types of repositories
- Domain-specific:
- Best choice if suitable, long-term plan, typically free of charge, maximum reach
- E.g. European Nucleotide Archive Links to an external site., ArrayExpress Links to an external site., PRIDE Links to an external site.
- General purpose:
- Second best, long-term plan, might cost (now or in future), good reach but less specific in metadata → more difficult for future users to judge if a dataset will be useful
- E.g. Zenodo Links to an external site., (SciLifeLab) Figshare Links to an external site., Dryad Links to an external site.
- In-house/institutional
- For archive/backup purpose mainly, might cost, limited reach unless also published in data catalogue
How find a domain-specific repository?
- EBI repository wizard Links to an external site. - guide depending on data type
- ELIXIR deposition databases Links to an external site. - core resources with long-term data preservation and accessibility plans
- FAIRsharing.org/databases Links to an external site. - catalogue of many repositories, with possibility to filter on e.g. domain
- Scientific Data Repository Guidance Links to an external site. - publisher’s recommendation
EBI Repository Wizard
EBI Links to an external site. hosts several life science repositories, suitable for different types of data. The Repository Wizard helps to identify which one is suitable for your data.
-
Go to the Wizard at https://www.ebi.ac.uk/submission/ Links to an external site.
Picture
-
Either explore the wizard with the purpose of finding a suitable repository for one of your projects, or choose among the scenarios provided below. Which repository is recommended?
- Genomics project with RNA sequences
- X-ray crystollography structure of a protein
- Gene expression data
- Protein sequencing data
- Proteomics project using mass spectrometry
- Electron microscopy structure images
Solution
- Genomics project with RNA sequences: European Nucleotide Archive Links to an external site. (DNA/RNA sequence -> no controlled access -> produced experimentally -> Other)
- X-ray crystollography structure of a protein: wwPDB OneDep Links to an external site. (Structures -> X-ray crystollography)
- Microarray gene expression data: ArrayExpress Links to an external site. (Expression data -> no controlled access -> Microarray gene expression)
- Protein sequencing data: UniProt SPIN Links to an external site. (Protein data -> no controlled access -> produced experimentally -> Protein sequencing)
- Proteomics project using mass spectrometry: PRIDE Links to an external site. (Protein data -> no controlled access -> produced experimentally -> Mass spectrometry -> Proteomics)
- Electron microscopy structure images: EMPIAR Links to an external site. (Structures-> Electron microscopy -> micrographs or particle stacks)