42 6.1 What is Open Data
This module will focus on open research data, the raw material. Data is often central to research and a key aspect of open scholarship.
In the digital age, data is the raw material on which discoveries are built, and unfettered access to research data, whether in the Life Sciences or the Social Sciences, is crucial to accelerating progress in research. Data plays a central role in our ability to predict and counter natural disasters, understand human biology, and develop advances in computing technology.
Despite its tremendous importance, today, research data remains largely fragmented—isolated across millions of individual computers, blocked by disparate technical, legal, and financial restrictions.
The amount of scientific and scholarly data grows exponentially each year, yet we still lack the infrastructure, policies, and practices to harness this vital resource. While some high profile projects—such as the Human Genome Project and the Large Hadron Collider—make their data openly accessible, too often data isn’t shared beyond those who generate it. The Internet was built by researchers to share data, but data sharing isn’t yet the norm in research.
Text Adapted from Setting the Default to Open, by SPARC Europe used under a CC-BY License.
Open Data: an Introduction
To understand what is meant by open data, start by watching this quick video introduction from the University of Guelph. Questions about open data at UBC can be directed to research.data@ubc.ca.
The previous video provides some broad strokes about what open data is, but as researchers and open scholars we need to understand the concept of open data in more detail. Let’s start with a simple definition and break this down even further.
Open Data is defined as structured data that is machine-readable, freely shared, used and built on without restrictions.
“Open data is data that can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.”
From the Open Data Handbook https://opendefinition.org/
The Open Definition provides more detail and is summarized in the Open Data Handbook by highlighting the most important points:
- Availability and Access: The data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.
- Reuse and Redistribution: The data must be provided under terms that permit reuse and redistribution, including the intermixing with other datasets.
- Universal Participation: Everyone must be able to use, reuse, and redistribute. There should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
As you can see in the definition above, open data is much more than making data available. It requires an established workflow for collecting, managing and storing that data and should adhere to standards for making that data open. The FAIR principles described in the next section are standards that can support researchers ensuring that their data is truly open. By using the FAIR principles to take care of your data from the planning stage, providing metadata and documentation to accompany your data and code in one container is the best way to easily make your data both future proof and open.
The FAIR Principles
Related to open access, open government, and open source, open data ensures public access to data and should include sufficient details so that others know how that data can be reused or repurposed. Reusable open data are guided by the FAIR principles for sharing data, meaning that they should be findable, accessible, interoperable and reusable (Wilkinson et al., 2016). These principles provide guidance for scientific data management and stewardship and are relevant to all stakeholders in the current digital ecosystem.
https://pressbooks.bccampus.ca/pose/wp-admin/admin-ajax.php?action=h5p_embed&id=22
Dig Deeper
Read about how the future of FAIR might look according to research scientist Brian Lavoie
- The Future of FAIR, as Told by the Past by Brian Lavoie
Privacy and Open Data
While open data should be governed by the FAIR principles to promote transparency and reproducibility, not all data can or should be made open. There are always privacy, ethical, or cultural issues to consider.
The FAIR principles provide guidance for sharing data, but it is with the understanding that ethical and contractual obligations will be upheld.
Some data may contain personal or other sensitive information that should not be readily accessible. For example, open data that contains the location of a rare plant or species at risk may further endanger that species if others are able to locate it. We should also consider personal information that may be available when collecting data and whether it is ethical to disseminate that information widely. Consultation with a Research Ethics Board may be necessary before making personal information open. In Canada the Tri-Agency Council Policy Statement on the Ethical Conduct for Research Involving Human (TCPS-2) provides guidelines for research and the use of research data.
There are methods to de-identify data and repositories may offer options to keep the data closed or embargoed. Good data documentation/metadata will protect sensitive data while still making it discoverable. Even in a more closed scenario a best practice is to let others know the data exists by creating a description of your data and using a metadata standard so others can find it, cite it, and request access if that is an option.
Policies and Open Data
Open data may be required by granting agencies or journals as part of their agreement for funding or publication. For example the Tri-Agency Statement of Principles on Digital Data Management stipulates that, “research data collected with the use of public funds belong, to the fullest extent possible, in the public domain and available for reuse by others” (Tri-Agency Statement of Principles on Digital Data Management, 2016).
This draft policy recommends that institutions have a strategy in place to support researchers fulfilling their data management requirements for funding and when implemented will require a data deposit into a recognized repository.
Reflection
Read the Tri-Agency Research Data Management Policy and reflect on how this might impact how you manage your data.
Further reading:
- Support for the Tri-Agency Statement of Principles on Digital Data Management in UBC
- Canada’s Open Data Initiative
- Visit the Portage learning module RDM 101: Canadian Policy Review for further reading and background on open research data.
Publishers
Publishers are also requiring open data and have various levels of policy in place. There are even entire journals dedicated to publishing data for all domains such as Scientific Data and the Research Data Journal for the Humanities and Social Sciences. The Transparency and Openness Promotion guidelines outline standards journals may adopt to demonstrate how the journal has introduced open practices.
Blog post: Is it Finally the Year of Research Data? The STM Association Thinks So
Test Your Knowledge
https://pressbooks.bccampus.ca/pose/wp-admin/admin-ajax.php?action=h5p_embed&id=2