Chapter 7 Variables Associations
7.1 Types of Bivariate Associations
To start with, what does it mean for two variables to be associated? Even without prior knowledge of statistics and statistics terminology, you likely have considered or at least noticed variables associations both during your studies and in general. For example, you probably know that fertility rates are higher in some countries and lower in others, and you might also know that the level of socioeconomic development also tends to differ between the two groups. You might also have noticed that, say, early childhood educators and hospital nurses tend to be women, while auto-mechanics or refrigerator repair technicians tend to be men. You certainly know that (for now) prime-ministers in Canada and presidents of the USA have tended to be white (and male, and Christian).
These of course are all examples of associations between variables. Every time it can be noted that specific attributes of one variable tend to go or appear more often with certain attributes of another variable, you’re looking at an association. That is, we’re looking for a pattern between the sets of attributes of two variables; a pattern where some attribute combinations are seen more frequently while other attribute combinations are observed less often.
Recall that we defined variables as characteristics that vary across cases. Variables can vary independently of one another, or they can vary together — in tandem, as it were — in such a way that when some attributes of one variable are present, you’d expect to see some specific attributes of the other variable present too. Like so: Countries defined as developed tend to have lower fertility rates than countries defined as developing, so we have the variables level of socioeconomic development on the one hand, and fertility rate on the other. The association pits high levels of the former variable with low levels of the latter variable and vice versa — low levels of the former variable with high levels of the latter. These two combinations (high development/low fertility and low development/high fertility) are more likely to be observed than a no-pattern situation, where all sorts of combinations of development and fertility levels would be equally likely.
Similarly, research has repeatedly shown that some occupations tend to be male-dominated while others female-dominated. If there were no association (i.e., no pattern between the two sets of attributes), we would expect to observe approximately equal numbers of women and men in all occupations — but from what we have seen, that’s not the case. That is, it seems there is an association between the variables gender and (choice of) occupation. Furthermore, participation in Canadian and US politics (and voters’ preferences), especially at the highest levels of power, appears also to be gendered — as well as associated with other variables like race/ethnicity and religious affiliation.
Do It! 7.1 Bivariate Associations
Try to think of some other bivariate associations on your own. Start with something simple, like asking yourself if you commonly encounter some characteristic alongside a specific other characteristic; e.g., are dark-haired people more likely to have brown eyes while at the same time are blonde people more likely to have blue eyes? (Or, are the combinations dark hair/brown eyes and blond hair/blue eyes more common than dark hair/blue eyes and blond hair/brown eyes? Is hair colour related to — associated with — eye colour?) Etc.
Now that you are more familiar with the associations vocabulary, let’s clarify the typology of variable associations. There are two substantively different types of variable associations: statistical associations and causal associations. Claiming a causal association between variables is stronger than the claim for statistical association. Further, having a statistical association between two variables is a prerequisite for claiming a causal association between them — a prerequisite that is a necessary but not sufficient condition, at that.
Statistical inference provides tests for establishing statistical association, to some basics of which I’ll introduce you in the remaining chapters. Establishing causality, however, takes statistical associations as only but a starting point, as you will see in later on. Statistical associations are for the most part a technical matter — causality, on the other hand, is based on logic. It involves one’s ability to consider (and account for) multiple variables’ associations at the same time.
When two variables vary together, we simply can say they are associated; however, when we claim causality, we call one variable the cause (or predictor) and the other the effect (or outcome).
In summary, finding if two variables are statistically associated (i.e., that some attributes of one of the variables tends to go with specific attributes of the other) is relatively easy. Claiming that one variable affects another (i.e., that changes in one variable produce/cause changes in the other variable), on the other hand, is not easy at all — rather, in the social world, it is quite difficult. But we’ll get to that later.
For now, let’s start with statistical associations and how to “find” them. To get there, first we need to take a brief trip to the (almost everyone’s favourite) land of descriptive statistics in order to learn to even recognize potential statistical associations. We do that through bivariate description, i.e., by describing two variables together, considering them and their potential association at the same time.