20 Handling Sensitive Data
Previous Lesson
Lesson 2 of 11
Working with sensitive data requires that you have measures in place to ensure its security throughout your project’s lifecycle.
Confidentiality and Anonymity
Data that is confidential has been entrusted to a person or entity (i.e., you as a researcher) with a commitment to the information being kept private and there being controlled or restricted access to it (PNSDEG, 2020).
When you assure research participants that their data will be kept confidential – e.g., by including this assurance on a consent form – you need to have a plan in place for how you manage that data securely.
One way to protect research participants is to collect anonymous data; that is, data which never has identifiers attached. If identifiers need to be collected, their removal at the earliest feasible point in your research process is recommended. This process is typically referred to as de-identification.
It is also important to note that not all data should be de-identified. In some forms of research, participants’ contributions are directly attributed to them to recognize their inputs and role in a project. Community-based projects and projects that collect oral histories and biographical data, for example, may take this approach.
Anonymous and De-identified Data1
Data is considered anonymous if it was originally collected without direct identifiers. Direct identifiers are pieces of information that can be used to identify a person or organization. Examples of this type of identifier include:
- Name
- Social insurance number
- Personal health number
One research design that may employ anonymous data collection is surveys because a survey can be designed in such a way that consent is implied by completing the survey, and no directly identifying information is collected. However, a potential concern here is that indirect identifiers can still be present in the data.
Indirect identifiers may be used in combination with other information to identify an individual. Examples of this type of identifier include:
- Date of birth
- Geographic information (e.g., workplace, place of residence, etc.)
- Level of education attained
- Personal characteristics
De-identification is the process of removing, transforming, or masking identifiers in a data set. Anonymized data (also called de-identified data) has had direct identifiers removed. Indirect identifiers can also be removed to reduce risk of re-identification.
For example, Statistics Canada releases public use microdata files which have some variables removed and others collapsed or recoded (i.e., by using larger ranges for income levels by merging two categories together, or other similar changes) to lower the risk of re-identification.
More information on de-identification can be found in De-identification Guidance produced by the Portage Network COVID 19 working group.
1. Definitions in this section are adapted from Portage Network Sensitive Data Expert Group (2020, p. 6). Licensed CC-BY-NC-4.0.
Consent
The processes for data collection and how participant identities will be protected need to be described in the consent process.
Consent is an important aspect of human participant research and the collection of sensitive data, and research with human participants requires research ethics board (REB) review, which requires you to explain the consent processes to be used in your project.
The TCPS 2 (2022) contains several principles of consent:
- It is given voluntarily.
- It is informed (i.e., participants receive enough information to make an informed decision about their participation).
- It is ongoing (i.e., participants must be updated to changes in a project that affect them and have the opportunity to reconsider their consent).
- Material incidental findings (i.e., findings outside the purpose of the study that impact the welfare of the participant) are disclosed.
- Consent must precede collection of, or access to, data.
Free and informed consent should be sought and documented; however, some modifications to these consent principles may be allowed, subject to REB review, in specific circumstances (for example, in a critical inquiry). If you have questions about consent processes, consult the TCPS 2 (2022), your supervisor, and your local REB.
Language related to data in the consent process must cover:
- The scope of how data will be used
- How confidentiality and anonymity is handled (and limits to anonymity)
- How long data is retained and any access restrictions
- Any limitations to how data can be withdrawn from the study
- Future usage of the data
The Sensitive Data Expert Group (2020) has created a document of sample language that can be considered and reworked for your own project. Your local REB may also have templates and language recommendations that you can consult.
Participants can also be asked to consent to secondary data use and the potential deposit and sharing of the data they provide. Chapter 3 Section E of the TCPS 2 (2022) addresses broad consent for storage of data (and human biological materials) for future research.
Broad consent allows for participants to agree to have their data stored for secondary use in future research without direct contact (or intervention) with participants for those future projects. Broad consent is in contrast to blanket consent. Blanket consent for unrestricted re-use for any purpose is not allowed under the TCPS 2, but broad consent is allowed. Broad consent should include specific restrictions, for example, future research in the same field, on the same topic or disease, or to prevent use by private industry (TCPS 2, 2022).
If you are planning to store and reuse data or to share it for future re-use, you should consult Chapter 3, Section E of the TCPS 2 (2022). Article 3.13 lists ten considerations to contemplate as appropriate to your particular project.
Secondary Use
Secondary use occurs when you use data that was collected for some other purpose, and not specifically for your study.
This type of data is referred to as secondary data. Using secondary data reduces the burden on potential research participants because you are not duplicating efforts by asking similar questions. Secondary data can also be used to confirm or replicate a previous study’s results or as a means by which to test new and related questions and hypotheses.
Note that according to TCPS 2 (2022) article 2.4, REB review is not required for secondary use of anonymous information provided any data linkage or dissemination does not create identifiable information. However, for data that has been anonymized (or de-identified), an ethics review is required. Consent for this data re-use may need to be sought (see article 5.5 in TCPS 2, 2022). It is a good idea to consult with a research ethics officer at your institution to clarify how to approach any particular secondary use of human participant data.