48 Enhancing Data Quality and Overcoming Bad Data
As discussed earlier, one of the primary concerns of data collection is to gather useful accurate, complete and appropriate data which will allow you to answer your research question. Robinson et al. (2019) define quality data as those that are (1) fit for their intended purpose, and (2) have a close relationship with the construct they are intended to measure. It is therefore important that from conceptualization to operationalization (measurement), collection and analysis, you work to ensure that you are intentional about the quality of the data you are collecting. Attention to the main errors that creep into research can be helpful in this endeavour. We briefly discuss each next:
- Coverage Error: whether you are sampling a large population, literature on a topic or blog posts online, if your data collection method excludes some groups, sources or social artifact that is important to your research, you will have coverage errors, and data quality issues.
- Measurement errors: if your measures do not capture the concepts that are central to your research question, your data will be of little use (see Chapter 7 for a discussion on measurement)
- Non-response errors: If a significant portion of our sample refuses to answer some (or all) of the questions on your research instrument, we might not have enough information to answer our research question.
- Sampling error: this occurs when the characteristics of our sample are different from the population from which it is drawn. This is mostly a problem in quantitative research, where it poses the problem of unrepresentativeness.
Table 8.1 - Common Errors in Social Research and Some Strategies For Overcoming Them | |
---|---|
Errors | Overcoming Errors |
Coverage Error | Check sampling frame to ensure that everyone, institution, artifact etc in the target population are included e.g., are all the blogs on dieting and exercise in BC included in your sampling frame?
Check to ensure that the sampling frame does not include those not in the target population e.g.Are blogs from Alberta included? Establish parameters (e.g.,) and check to ensure that the sampling frame is up-to-date e.g. Does it include bloggers who started blogging a month ago or those 10 years ago? |
Measurement Error | Use established measures where possible
Use multiple measures for the same construct Pilot test your measures |
Non-response errors | Set expectations about the kind of questions that will be asked and the expected duration of the survey/interview
Emphasize the benefits of the research and think of ways to reduce costs to participants Make questions simple and interesting; surveys should be easy to navigate |
Sampling Errors | Define and specify the population of interest and ensure that the particular subpopulation is being recruited
Increase sample size Consider the selection and sampling procedures that best reach the target population, and that is most appropriate to the research question (e.g. convenience versus snowball sampling) |
For more information on overcoming errors in social research, see Mellenbergh, G. J. (2019). Counteracting methodological errors in behavioral research. Springer International Publishing. |
The previous table outlines some of the main errors that can undermine the data that we collect. As will be discussed later in the chapter, they can result in missing and incomplete data or inappropriate data for our research question. The suggested strategies are by no means exhaustive (see Mellenbergh, 2019 for a more comprehensive discussion) but we hope they can help you to think more intentionally about your research design and collection techniques.
References
Robinson, J., Rosenzweig, C., Moss, A. J., & Litman, L. (2019). Tapped out or barely tapped? recommendations for how to harness the vast and largely unused potential of the mechanical turk participant pool. PloS One, 14(12), e0226394-e0226394. https://doi.org/10.1371/journal.pone.0226394
Mellenbergh, G. J. (2019). Counteracting methodological errors in behavioral research. Springer International Publishing.
The data that are fit for their intended purpose, and have a close relationship with the construct they are intended to measure.
Failure to include all components of the target population being studied, often arising from incomplete sampling frames.
When the response provided differs from the real value; such errors may be attributable to the respondent, the interviewer, the questionnaire, the collection method or the respondent's record-keeping system (OECD, 2013).
Occurs when a researcher fails to get a response to at least one of the questions on an instrument.
A statistical error arising from the failure to select a sample that fully represents the population. It is inevitable each time a sample is drawn.