Overview

Teaching: 20 min
Exercises: 5 min

Questions

What are good practices for organising files and documentation to support your project?

Objectives

Understand what impact file organisation can have on a project or working group

Adopt good practices for naming and organising files

Design naming conventions for a project or working group

About this episode

This episode addresses some of the reasons to why file organisation is important for data management and a selection of good practices for organising research folders and files. The aim is to get you started and thinking about what will work for you and your project or team. Depending on your research area and the type of research you’re involved in; you may find a more optimal way to organize your work.

About this episode

Good practices for organising files and folders

How to organise files and folders

Organise files hierarchically

Use folders to divide files into categories

Discussion

Examples of helpful characteristics or file attributes

Choose a file naming strategy

A file name is a principal identifier of a file

File naming strategy should be consistent in time and among different people

Create documentation files

Examples

How to name files and folders

Discussion

Potential benefits of a File Naming Convention

Three principles for (file) names:

Search and filtering friendly

Encode/extract metadata from filenames

Descriptive filenames

Embrace the slug

Plays well with default ordering

Create your own File Naming Convention

Further reading

Good practices for organising files and folders

Depending on your background and experiences you could be thinking of different reasons to why it would be beneficial or useful to systematically organise your research and data files. The following is a selection of common reasons:

Easier to locate a file
Find similar files together
Moving files becomes much easier
Easy to identify which files you want to back up
Keep organised in the long-run
Increases productivity
Helps you to keep and maintain a record of the project
Projects can easily be understood by others (including your future self)

It’s natural for some of your files to become unorganised from time to time—perhaps your downloads or desktop folder—and in those cases there may be multiple copies and versions of files cluttering your view and making it challenging to find what you’re looking for. You can avoid this clutter by planning for organising your files ahead of time, and any system is better than none.

Unorganised files on desktop

In this context we will be looking into practices for classifying and structuring files and folders to make them more useful. Your guiding principle should be that someone unfamiliar with your project should be able to look at your files and understand, in detail, what you did and why. This someone could be a researcher who wants to reproduce the results in your article, a new collaborator who needs to understand the details of your experiments, or—more commonly—that someone could be your future self not remembering what you were up to when you created a particular set of files. Poor organisation practices can lead to significantly slower research progress and you may end up having to spend significant time reproducing results from previous experiments or completely reconstructing an analysis to address minor flaws, new data or a new technique.

How to organise files and folders

Spend some time planning how you are going to organise your data at the beginning of a project. Consider how you and others will look for and access the files throughout the project’s life cycle and ensure that all people involved can commit to using the folder hierarchy, file naming conventions, and a strategy for onboarding new contributors. You can start small and expand as you develop your practices.

Organise files hierarchically

Folders are containers for your files and are sometimes called directories. A folder can contain other folders—sometimes called subfolders (or subdirectories)—and you can organise your files hierarchically by creating a structure of folders and subfolders. Each folder corresponds to a category that should be mutually exlusive with other folders at the same level. And since a file can only be placed in one folder—at one place in the hierarchy—you should aim to create a structure that make it easy for yourself and your collaborators to determine where any given file should be located.

data

In cases where files needs to be disperserd across several storage solutions, it can be a good idea to imagine a virtual top level of the folder hierarchy where each subfolder corresponds to a storage solution. This virtual hierarchy can be described in a shared document to allow your collaborators to determine on which storage solution any given file should be located.

Use folders to divide files into categories

Put each project in its own folder named after that project. Ideally you want to keep the folder’s name under 32 characters long while at the same time including a combination of the project title, a unique identifier and the date.

Consider the best hierarchy for the files in the project and decide whether a deep or shallow hierarchy is preferable. If you have several independent data collections, it is advisable to create a separate data folder for each collection. But you can use any meaningful characteristic or file attribute as a basis for organising your files, which of them will be most helpful varies widely across domains and specific projects.

The following examples illustrate some caveats in naming files, folders and versions.

data data data

The following examples illustrates how nesting can vary

data data

Discussion

What characteristics do you use to create folders and subfolders in your projects?
E.g., data type, collection year, …

Examples of helpful characteristics or file attributes

Year or other date

Type of data, document or file

Project stages

Analysis version or revision

Experiments

Instruments

Time periods

Geographic location

Storage requirements

Team member, institution or project site

Choose a file naming strategy

Two important starting points for your file naming strategy are:

A file name is a principal identifier of a file

Good file names provide useful clues to the content, status and version of a file, uniquely identify a file and help in classifying and sorting files. File names that reflect the file content also facilitate searching and discovering files. In collaborative research, it is essential to keep track of changes and edits to files via the file name.
File naming strategy should be consistent in time and among different people

In both quantitative and qualitative research file naming should be systematic and consistent across all files in the study. A group of cooperating researchers should follow the same file naming strategy and file names should be independent of the location of the file on a computer.

data

Create documentation files

Systematically documented research data is the key to making the data publishable, discoverable, citable and reusable. Clear and sufficiently detailed documentation improve the overall data quality. It is vital to document both the study for which the data has been collected and the data itself. These two levels of documentation are called project-level and data-level documentation and can both be saved as files and stored in your file hierarchy.

A common practice it to use plain text files – a very simple form of text documents that only contain characters. These files are often called README-files and can be placed strategically across your file hierarchy to help people find what they want or direct them to where further project-level or data-level documentations can be found.

data

Examples

How many folders and how deeply nested subfolders you use will depend on many factors, such as the complexity or length of the project or the number of collaborators working on a project. A multi-year project with many contributors and subcontractors may result in lots of folders and files.

You should try to establish a structure early and be prepared to adapt it as needed to remain organized throughout the duration of the project. The best way to organize your folders really depends on the attributes that are most helpful to your project and, again, how many files you will be working with.

data

data

data

The following is a structure proposed for computational biology projects. The data folder is for storing our fixed data sets and we see data sets are organized by date. You can see README files in each of the yeast and worm data folders. The src folder contains source code and code, the bin folder contains compiled binaries and scripts, and the results directory is for tracking computational experiments performed on the data. This could be a helpful structure to follow if you are running a computational experiment.

data

How to name files and folders

A File Naming Convention is a framework, or protocol if you like, for naming your files in a way that describes what the files contain and, importantly, how they relate to other files.

Bad	Better
myabstract.docx	2014-06-08_abstract-for-sla.docx
Joe’s Filenames Use Spaces and Punctuation.xlsx	joes-filenames-are-getting-better.xlsx
figure 1.png	fig01_scatterplot-talk-length-vs-interest.png
fig 2.png	fig02_histogram-talk-attendance.png
JW7d^(2sl@deletethisandyourcareerisoverWx2*.txt	1986-01-28_raw-data-from-challenger-o-rings.txt

Discussion

What are examples of potential benefits of agreeing on a File Naming Convention for a project?

Potential benefits of a File Naming Convention

Easier to process - All team members won’t have to over think the file naming process

Easier to facilitate access, retrieval and storage of files

Easier to browse through files, saving time and effort

Harder to lose!

Having logical and known naming conventions in place can also help you with version control (See Version Control for more information).

Check for obsolete or duplicate records

Three principles for (file) names:

Machine readable – avoid spaces, deliberate punctuation, no accented characters, consistent letter casing
Human readable - a name describes the content of the file, connects to concept of a slug from semantic URLs
Plays well with default ordering – put something numeric first, use the ISO 8601 standard for dates, left pad other numbers with zeros

Search and filtering friendly

Except of complete file listing:

Same using Mac OS Finder search facilities:

Same using regex in R:

Encode/extract metadata from filenames

Deliberate use of “-“ and “_” allows recovery of meta-data from the filenames:

“_” underscore used to delimit units of meta-data I want later.
”-“ hyphen used to delimit words so my eyes don’t bleed.

This happens to be R but also possible in the shell, Python, etc.

Descriptive filenames

Which set of file(name)s do you want at 3 a.m. before a deadline?

Embrace the slug

Plays well with default ordering

Chronological order:

Logical order: Put something numeric first

Dates: Use the ISO 8601 standard for dates: YYYY-MM-DD

Left pad other numbers with zeros

If you don’t left pad, you may get this:

 10_final-figs-for-publication.R
 1_data-cleaning.R
 2_fit-model.R

Create your own File Naming Convention

Briney, Kristin A. (2020) File Naming Convention Worksheet. [Teaching Resource] (Unpublished)

What group of files will this naming convention cover?

What information (metadata) is important about these files and makes each file distinct?

Do you need to abbreviate any of the metadata or encode it?

What is the order for the metadata in the file name?

What characters will you use to separate each piece of metadata in the file name?

Will you need to track different versions of each file?

Write down your naming convention pattern

Document this convention in a README.txt (or save this worksheet) and keep it with your files

Further reading

ELIXIR (2021) Data organisation. In Research Data Management Kit. A deliverable from the EU-funded ELIXIR-CONVERGE project (grant agreement 871075).

Briney, Kristin A. (2020) File Naming Convention Worksheet. [Teaching Resource] (Unpublished)

Key Points

Organisation is a key aspect of data management and will help keep the project on track by saving time and minimising risk

Think hard at the beginning of your project about how you are going to organise your data as it grows

Structure project folders hierarchically to divide data into categories that can easily be understood and described

Consider what makes sense for your project and research team, and how people new to the project might look for data files and documentation