Abbreviations and terms frequently used during the course, presented in alphabetical order.
Branch (in Git): A copy of a repository that is contained within the original repository.
Cloning (in Git): Copying the whole repository to your laptop-the first time
Clustering (in OpenRefine): Finding groups of different values that might be alternative representations of the same things
Commit (in Git/GitHub): A saved change / version of the project, gets a unique identifier / commit hash
Controlled Vocabulary: A list of terms describing a certain domain of knowledge, providing a definition of the term (one per phenomenon, with unique identifier) and synonyms
CSV: Comma Separated Values; text-file format with tabular data using comma to separate values
Data Dictionary: File explaining the type of info in each (meta)data field
DMP: Data Management Plan; a document addressing requirements & practices for managing a project’s data, code & documentation
Data publication plan: A plan for making research data available to the public
DSW: Data Stewardship Wizard; a tool to create DMPs
DOI: Digital Object Identifier; a kind of persistent identifier
EGA: European Genome-phenome Archive; a public repository for human genome sequence data
EMBL-EBI: European Bioinformatics Institute; a UK-based academic research institute, part of European Molecular Biology Laboratory-EMBL
ENA: European Nucleotide Archive; a public repository for non-human genome sequence data
FAIR: Findable, Accessible, Interoperable, Reusable
-
Findable: Having globally unique identifiers, exist in search resources, explained by metadata
-
Accessible: Retrievable from a searchable resource online
-
Interoperable: Easily read even by machines, use vocabularies, refer to other (meta)data
-
Reusable: Reusability license, clearly described & detailed (meta)data so that they can be reused by others
FAIR-ification: Making data FAIR
Facet (in OpenRefine): A feature helping to get an overview and more consistency to the data
Fetch (in GitHub): Move the code/project from GitHub to local repository, the changes are not merged (applied) locally
Forking (in Git/GitHub): Taking a copy of a repository (typically not yours); your copy (fork) stays in GitHub and you can make changes to it
IDE: Integrated Development Environment
Git: A Version Control System (VCS) & program that helps you keep track of changes in files
GitHub: A web-based hosting service for Git repositories
GitHub Desktop: An application to manage files with Git
JSON: JavaScript Object Notation; a format saving OpenRefine scripts
Markdown (.md): A lightweight markup language for creating formatted text using a plain-text editor
merge: The act of incorporating new changes (commits) from one branch to another
Metadata: Data about the data; context-dependent information that explains the data in a way that anyone could understand in detail what was done in a project and why
Metadata element: Defined input value type
Metadata standard: A collection of metadata fields or elements relevant to the documented data
Null values: Missing data; advisable to specify why a value is missing (e.g. not applicable, not collected etc)
Ontology: A controlled vocabulary capturing relationships between agreed terms, includes hierarchies/trees
OpenRefine: Free, open source tool to clean and transform messy data
Open Science: The movement to make scientific research & its dissemination accessible to all levels of society
PID: Persistent Identifier; a unique, long-lasting identifier to a digital resource
Processed/analyzed data: Data that have been processed (i.e. normalized, annotated, statistically analyzed etc)
Public repository: Freely accessible online database
Pull (in GitHub): Move the code/project from GitHub to local repository, the changes are merged (applied)
Pull request (in Git/GitHub): A request to add a commit or a collection of commits to a repository.
Push (in GitHub): Move the code/project from local repository to GitHub
Raw Data: Data coming directly from an instrument (e.g. fastq sequencing files)
README-file: Text file describing a file/folder/project, containing commonly required info
Remotes (in GitHub): Links between repositories
Repository (in Git): The project, contains all data & history
RStudio: Integrated Development Environment (IDE) for working with R
Tabular data: Data organized in a table with columns and rows
Tag (in Git): A pointer to one commit, to be able to refer to it later
Tar.gz file: A compressed file (can be generated when export files from OpenRefine)
TSV: Tab-separated values; text-based file format for storing tabular data
Vocabularies: Standardized, FAIR ways of capturing info about the data