Putting time and consideration into how content will be stored will support greater ease of sharing. As stated earlier, reproducibility and replicability are dependent on increased documentation. In addition to increased documentation, organization of research outputs in a manner that are easily identified, collated, and packaged for openly sharing.
When working on research projects, there are often many files that need to be stored on a computer. These files may include:
- Raw data files
- Processed data files: you may need to take the raw data and process it in some way
- Code and scripts
- Outputs like figures and tables
- Writing associated with your project
The following provides best practices around organizing files that should be incorporated into research workflows, including open workflows.
Importance of File Names
As new files are created on a researcher’s computer, a carefully crafted naming convention needs to be developed that makes it easier for anyone to find things and also to understand what each file does or contains.
It is good practice to use file and directory names nthat are:
- Human readable: use expressive names that clearly describe what the directory or file contains (e.g. code, data, outputs, figures).
- Machine-readable: avoid special characters (e.g. & or accents) and spaces. Instead of spaces, you can use
_to separate words within the name to make them easy to read and parse.
- Sortable: it is nice to be able to sort files to quickly see what is there and find what you need. For example, you can create a naming convention for a list of related directories or files (e.g.
02-terry.jpg, etc), which will result in sortable files. The most important part of a naming convention is that it’s consistent; beyond that, order information as it is useful for ways that you would want to sort your files.
Learn more about best practices in file naming and look at examples here:
These guidelines not only help you to organize your directories and files, but they can also help you to implement machine-readable names that can be easily queried or parsed using scientific programming or other forms of scripting.
Using a good naming convention when structuring a project directory also supports reproducibility by helping others who are not familiar with your project quickly understand your directory and file structure.
For practical examples of naming conventions, watch the following video on document naming:
Scenario – Video Interview File Naming Conventions
Let’s consider this scenario: Professor Sam Meyers is performing over 200 interviews with first-generation undergraduate students for their research on information retrieval. Professor Meyers is developing a file naming convention to ensure the data is easily retrieved on their shared drive for future use. Their study is currently titled “IR Study.” The interviewees have been labelled 1 to 200 for anonymity. The first video recording occurred on November 5, 2020, and was performed by their graduate research assistant Karina Cassidy.
Professor Meyers and their research team decided to include the following information in the file naming conventions:
- Project Name
- Date of Data Collection
- Content of the File
- Researcher’s Initials
- Interviewee Label
Best Practices for File Naming
Computer Readable Conventions
If your files follow identifiable patterns or rules, it will allow you to more easily manipulate them. This in turn will make it easier for you to automate file processing tasks.
A few other best practices to consider when naming files within a project:
- Avoid spaces: spaces in a file name can be difficult when automating workflows.
- Use dashes-to-separate-words (slugs): dashes or underscores can make is easier for you to create expressive file names. Dashes or underscores are also easier to parse when coding.
- Use the ISO Date Standard: use the date standard of YearMonthDay (e.g. 20201106)
- Consider whether you may need to sort your files. If you do, you may want to number things.
Consistent File Names
It might be tempting when naming files and directories to use
Upper case. Case can cause coding issues for you down the road particularly if you are switching between operating systems (Mac vs Linux vs Windows).
To keep things simple and to avoid case sensitivity issues, use lower case naming conventions for all file and directory names is suggested.
However, consistency is key when developing naming conventions and you can choose to use CamelCase. CamelCase is the practice of writing phrases without spaces or punctuation, indicating the separation of words with a single capitalized letter, and the first word starting with either case (e.g. Instead of Case study, you would write CaseStudy).
Use Meaningful (Expressive) Names
Expressive file names are those that are meaningful and thus describe what each directory or file is or contains. Using expressive file names makes it easier to scan a project directory and quickly understand where things are stored and what files do or contain. Consistency is key when developing naming elements and structure and following the conventions for all project files.
Example Naming Elements and Structure
Sample Named File
Additional information about file naming and how it relates to open data can be found in the Open Data module.
Create a README File
A readme file at the top level of your project is a standard convention. The readme is a file that describes data/software packages and tools used to process data in your project. The readme should also describe files, associated naming conventions and other details important to understanding the files.
Additional information about readme files and how they relate to open data can be found in the Open Data module.
Proprietary File Formats
Proprietary formats are formats that require a specific tool (and often a specific license) to open. Examples include Excel (.xls) or Word (.doc). These formats may change over time as new versions come out (example:
.xls upgraded to
When choosing file formats for a project, it’s important to consider ongoing access to the license of the tool and whether others have access as well.
Choosing formats that are operating system and tool-agnostic such as
.txt is one way to avoid the issues related to tool licenses and to increase the potential for sharing fully usable content.
The Open Software module provides insight into open source tools.
Adapted from Lesson 3. How To Organize Your Project: Best Practices for Open Reproducible Science by Jenny Palomino, Leah Wasser, Max Joseph (Earth Lab) licensed under a CC BY-NC-ND 4.0 LICENSE.