# 66 Secondary Data Analysis

Many of you will be using secondary data in your thesis. Statistics Canada hosts a number of data sets, which might be the source of your data (see Box 10.2.1 below). Many of these datasets can only be accessed through your institution (Research Data Centers), so be sure to check your librarian for access. In some cases, it might be necessary to obtain an Institutional Ethics Review before you can use the existing dataset, so check if there are any restrictions on access or use of the dataset early (see Chapter 3).

Box 10.1 – Examples of Datasets Available at Statistics Canada

• Workplace Employee Survey
• Survey of Financial Security
• Survey of Earned Doctorates
• National Apprenticeship Survey
• Victimization Survey
• General Social Survey
• Census and National Household survey
• National Population Health Survey
• Survey of Household Spending
• Vital Statistics (Birth Database)
• Aboriginal Peoples survey
• Survey of Family Expenditure

For a full list of survey by Statistics Canada, see https://www.statcan.gc.ca/en/microdata/data-centres/data

As discussed in Chapter 7, the methodological limitations of your data source will also impede your analysis. To get your secondary data ready for analysis, we suggest the following steps:

1. Understand the dataset (population, sampling process, level of representativeness, units of measurement, descriptive statistics, etc.). You can do this by consulting the codebook.
2. Statistical concerns: E.g., you should always check if the data is normally distributed, if the observations are independent, ad for homogeneity of variance etc. These will affect the kind of statistical analyses that you can do. For example, if the data is not normally distributed, you will not be able to run tests such as one way and two way ANOVA tests.
3. Sampling: Make sure that you establish how the sample was drawn. This will determine the limitations of the study. Also, look out for issues such as non-response rates (sample and item).
4. Data cleaning: decide what to do with missing data, outliers etc.
5. Determine how you will treat key variables: Examine the code book to see how the variables of interest are initially measured. Recode them in a way that would make sense for your project. Be mindful of the direction of the measures. This can impact your interpretation. It is best to recode variables that are not in the same direction e.g., if the higher number indicates higher intense attribute, ensure that this is consistent across variables.
6. Explain your (re)coding strategy​: Make a note of how were variables re-coded and why?​ If you are using an index or a scale, explain why that particular index or scale? Justify it theoretically or point to previous research that used a similar index or scale.​ If your analytical strategy is different from those in the literature, explain why (see Chapter 7).
7. Know the assumptions of the tests that you are thinking of doing, and make sure that the data fits.
8. Start with descriptive analysis to get a feel of the data before performing bivariate and multivariate statistics.
9. Record your statistical results according to the referencing format that you are using.
10. Interpret and discuss the results.
11. See Samuels (2020) for additional steps in quantitative data analysis.

# References

Samuels, P. (2020). A really simple guide to quantitative data analysis. Research Gate. DOI:10.13140/RG.2.2.25915.36645