Data repositories ensure a persistent location where the data file(s) and associated documentation will be archived and preserved. They also maintain administrative, bibliographic, and licensing metadata.

An Institutional or a disciplinary repository are two possible routes for publishing data. An example of an institutional repository is UBC’s Scholars Portal Dataverse. This repository is generalist and discipline agnostic where any researcher can create an account and deposit data using the provided metadata template that includes all of the necessary metadata fields to enhance discovery. Examples of disciplinary repositories in the Registry of Research Data Repositories hosts a list and links to a variety of repositories where you can search or browse by subject to find a resource relevant to research in your field.

Your choice of repository will depend in part on why you’re depositing your data. Is it open to support a publication, open to encourage its re-use as a standalone dataset, or open to contribute to a larger dataset like a gene databank or a databank of biological observations?

Regardless of the route you choose make sure the repository will provide a persistent identifier or DOI. It will include data about you the creator to help users cite your data, subject or keyword fields to help with discovery, information about the files types your data is stored in, and how your data is licensed to help with sharing and reuse.

Scenario – Data persistence

Marija was a researcher at a university and published an article in the journal Conservation Biology. The article provided added information about related data sources and a database with citations that could be downloaded from Marija’s faculty page on the institution’s website. This worked until Marija left that university and the faculty page was disabled. Subsequently, if readers sought the related data to the article in Conservation Biology they were now directed to an “Access forbidden” page meaning any data or related sources to the article would not be found.

When the institution started its own data repository Marija could deposit that data and related files. This provided a persistent URL. Since the URL is designed to be persistent Marija’s data should be accessible for a long time no matter where she is based.

Generalist data repositories:

Choosing a generalist data repository is an option if there is no discipline specific repository in your field. Your institution may have its own repository to store your data providing long term access and preservation. Listed below is a selection of generalist repositories.

Scholars Portal Dataverse: The Scholars Portal Dataverse is a publicly accessible data repository open to affiliated researchers to deposit and share research data openly. It is a Canadian hosted portal that is hosted by the University of Toronto libraries. The affiliated members are primarily Canadian universities. If you have questions about UBC’s Scholars Portal Dataverse contact research.data@ubc.ca

Dryad: The Dryad Digital Repository is a curated resource that makes research data discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of data types.

Figshare: Figshare allows users to make all of their research outputs available in a citable, shareable and discoverable manner. Figshare is cloud-based and features the ability to preview data.

Zenodo: Zenodo does not impose any requirements on the format, size, access restrictions, or license. All data is licensed CC0

What about OSF?

As an open workflow tool, OSF is a good option when working with raw data or temporarily storing the analyzed data, but it is not ideal for long term preservation and storage. OSF includes Add Ons for data storage options with Dataverse and Figshare, for example. Creating a data component in OSF and adding your data file or files into the component with well structured readme can be a good way to share your data and link that data to and within your OSF project. The publishing option in OSF allows for flexibility in how you provide access to your project’s data. However, the more granular and data specific metadata options as well as the ability to assign user access at the file level as opposed to the component level, and the preservation offered by repositories are recognized as preferred locations for long term preservation.

Licensing Open Data

A license can be applied when depositing into a repository. This is an often neglected step. But without a license, no one knows how they can use the data and how they are expected to give credit for your hard work. The data repository may have a default license so be sure to check the license options and select an open license that suits how you want to share your data and that for attribution enabling the user to download, reuse and repurpose the data.

Choosing a License

During the POSE program different types of open licenses in relation to different aspects of open scholarship is discussed. When we are licensing our data there are particular questions and considerations. Check with the funding agency or journal that requires your data be made open as they might indicate a specific license be applied to your data. If you are choosing you own licensing option, select the appropriate one based on how you want others to reuse your data. It may be helpful and is a good practice to include a rights statement within your dataset or in your readme file.

To compare different data licenses the following table created by JISC can help you select an appropriate license.

The Public License tool linked below is a selector that helps you decide on the appropriate license for your dataset:

Choose a License

Test Your Knowledge

Scenario adapted from Case study: Data persistence with permission from Standford Libraries. 


POSETest Copyright © by luc. All Rights Reserved.

Share This Book