A - 4 Glossary

Licenced under CC-BY 4.0 and OSI-approved licenses, see licensing.

Abbreviations and terms frequently used during the course, presented in alphabetical order.

Branch (in Git): A copy of a repository that is contained within the original repository.

Cloning (in Git): Copying the whole repository to your laptop-the first time

Clustering (in OpenRefine): Finding groups of different values that might be alternative representations of the same things

Commit (in Git/GitHub): A saved change / version of the project, gets a unique identifier / commit hash

Controlled Vocabulary: A list of terms describing a certain domain of knowledge, providing a definition of the term (one per phenomenon, with unique identifier) and synonyms

CSV: Comma Separated Values; text-file format with tabular data using comma to separate values

Data Dictionary: File explaining the type of info in each (meta)data field 

DMP: Data Management Plan; a document addressing requirements & practices for managing a project’s data, code & documentation

Data publication plan: A plan for making research data available to the public

DSW: Data Stewardship Wizard; a tool to create DMPs

DOI: Digital Object Identifier; a kind of persistent identifier

EGA: European Genome-phenome Archive; a public repository for human genome sequence data

EMBL-EBI: European Bioinformatics Institute; a UK-based academic research institute, part of European Molecular Biology Laboratory-EMBL

ENA: European Nucleotide Archive; a public repository for non-human genome sequence data

FAIR: Findable, Accessible, Interoperable, Reusable

  • Findable: Having globally unique identifiers, exist in search resources, explained by metadata

  • Accessible: Retrievable from a searchable resource online

  • Interoperable: Easily read even by machines, use vocabularies, refer to other (meta)data

  • Reusable: Reusability license, clearly described & detailed (meta)data so that they can be reused by others

FAIR-ification: Making data FAIR 

Facet (in OpenRefine): A feature helping to get an overview and more consistency to the data

Fetch (in GitHub): Move the code/project from GitHub to local repository, the changes are not merged (applied) locally

Forking (in Git/GitHub): Taking a copy of a repository (typically not yours); your copy (fork) stays in GitHub and you can make changes to it

IDE: Integrated Development Environment

Git: A Version Control System (VCS) & program that helps you keep track of changes in files

GitHub: A web-based hosting service for Git repositories

GitHub Desktop: An application to manage files with Git

JSON: JavaScript Object Notation; a format saving OpenRefine scripts

Markdown (.md): A lightweight markup language for creating formatted text using a plain-text editor

merge: The act of incorporating new changes (commits) from one branch to another

Metadata: Data about the data; context-dependent information that explains the data in a way that anyone could understand in detail what was done in a project and why

Metadata element: Defined input value type

Metadata standard: A collection of metadata fields or elements relevant to the documented data

Null values: Missing data; advisable to specify why a value is missing (e.g. not applicable, not collected etc)

Ontology: A controlled vocabulary capturing relationships between agreed terms, includes hierarchies/trees

OpenRefine: Free, open source tool to clean and transform messy data

Open Science: The movement to make scientific research & its dissemination accessible to all levels of society

PID: Persistent Identifier; a unique, long-lasting identifier to a digital resource

Processed/analyzed data: Data that have been processed (i.e. normalized, annotated, statistically analyzed etc)

Public repository: Freely accessible online database

Pull (in GitHub): Move the code/project from GitHub to local repository, the changes are merged (applied)

Pull request (in Git/GitHub): A request to add a commit or a collection of commits to a repository.

Push (in GitHub): Move the code/project from local repository to GitHub

Raw Data: Data coming directly from an instrument (e.g. fastq sequencing files) 

README-file: Text file describing a file/folder/project, containing commonly required info

Remotes (in GitHub): Links between repositories

Repository (in Git): The project, contains all data & history

RStudio: Integrated Development Environment (IDE) for working with R

Tabular data: Data organized in a table with columns and rows

Tag (in Git): A pointer to one commit, to be able to refer to it later

Tar.gz file: A compressed file (can be generated when export files from OpenRefine)

TSV: Tab-separated values; text-based file format for storing tabular data

Vocabularies: Standardized, FAIR ways of capturing info about the data