Metadata is essential for data discovery, sharing and reuse. As it is “data about data” metadata provides a description of the study, files and variables answering the questions:
- Who created the data?
- What is the content of the data?
- When were the data created?
- Where is it geographically?
- How were the data developed?
- Why were the data developed?
There are three levels of metadata:
- Descriptive: Provides information about the data that will help people understand what they will find in the dataset and the context. Project title, authors, keywords and collection methods are all types of descriptive metadata. Be sure to describe your variables giving them clear names. Name files with basic metadata file names.
- Administrative: What software is required to use the data? What is the license attached to the data?
- Structural: You can link between data files and link to the publication. Structural metadata demonstrates how the data files relate to one another.
An established metadata standard will provide common terms, definitions, and structure and may vary depending on the repository you select. Each repository will have their own standard, but will be consistent in common terminology, definitions, language and structure. Good metadata ensures that your files are human and machine readable. Different disciplines may have their own standards or have adopted a specific metadata standard.
When you deposit your data into a repository, metadata fields are required to be completed as part of your deposit. The amount of metadata provided will enhance discovery.
*Content adopted from UBC Library Research Commons’ Research Data Management Workshop 2020.
Read this data story from DataONE and consider the discussion point questions at the end of the story.
Metadata? I thought you were in charge of that.
Review the metadata from one or all the examples in the links below. Click on the metadata tab to see the metadata fields included for the dataset. Notice if the metadata includes geospatial metadata as well.
- 2006 Census of Canada – Selected Characteristics for Housing – Vancouver, Toronto CMAs at the Census Tract (CT) Level [custom tabulation] 004
- U.S. Cost Indices for Asphalt Concrete and Portland Cement Concrete Highway Construction: General Construction and Maintenance & Rehabilitation
- Replication Data for: “Local cold adaption increases the thermal window of temperate mussels in the Arctic”
File Naming and Structure
File naming and structure are important pieces of managing your data to make it easier for others to use. In keeping with the FAIR principles data formats should be in an open format, unencrypted and uncompressed.
An open format is non-proprietary so that the file can be opened using open software that is not owned by a specific company. For example, instead of saving to an Excel file save as a CSV or XML.
Unencrypted data is easily accessible for anyone whereas encrypted data is secure and locked and would require a pass code or key to unlock the data.
An uncompressed file is one that is stored in the original format and hasn’t been compressed into another format. The UK Data Archive provides a list of recommended formats.