Chapter 2 What Data Looks Like and Summarizing Data
2.3.1 Relative Frequency: Adding Percentages
Simply counting the frequency of the different variable’s categories (or the number of specific responses) is rarely enough. Often, we also want to know what proportion — or what percentage — of the total each category represents. This is especially important when comparing across two or more different groups. Thus we will stop on our way to frequency tables to undertake a brief side quest into relative frequency territory.
Watch Out!! #3… for Cross-Group Comparisons Using Counted Numbers
Imagine that researchers are conducting a study on eating habits and they have interviewed 170 people; 102 identified as men and 68 identified as women. Say that the researchers found that 17 of the men and 13 of women reported a vegan diet. Can the researchers conclude that men tend to favour vegan diets more than women do?
If you go by the actual, counted numbers reported, you may decide that yes, the researchers’ conclusion is correct as 17 is more than 13, i.e., four more men than women have reported eating vegan. This, however, would be wrong. We cannot compare the two groups (men and women) directly since the groups have different sizes. That is, comparison of the numbers as counted in the two groups has little meaning since it does not take into account group size. Yes, more men report eating vegan but men in the study outnumber women by 24 to start with. Thus, maybe we find more vegan men than women simply because there are more men than women in the study. What we should be asking ourselves instead is whether a larger proportion of men eat vegan, compared to women — and the correct answer would require a comparison of the numbers relative to group size.
A quick calculation reveals that 17 out of 102 is actually less than 13 out of 68:
That is, the proportion of vegan men (0.167) is smaller than the proportion of vegan women (0.191), so no, we cannot say that men tend to be vegan more than women do. Rather, it’s the other way around: more women than men tend to eat vegan, because vegan women are a higher proportion (i.e., the number for women is higher relative to their group size).
To conclude, never use numbers as counted to compare between groups (unless they are of equal size). To make comparison possible — and meaningful — you should always use proportions or percentages (i.e., the numbers relative to the size of each group).
A bit more notation then: if we denote frequency by f, and you recall that N stands for number (of elements in a dataset; of people in a group, etc.), it would be easy to see that proportion — denoted by p — should be
.
While actual numbers represent frequency, proportions are one way of expressing relative frequency. You probably are more familiar with another way of expressing relative frequency — percentages.
In the example I used in the Watch Out!! #3 above, we concluded that more women than men were vegan based on the fact that the proportion of vegan women (0.191) was higher than the proportion of vegan men (0.167). In everyday life, people usually tend to use percentages to express that. To convert proportions to percentages you only need to multiply by a 100[1]:
Thus, we get the following percentages when comparing vegan men and women from the Watch Out!! #3 above:
and
.
That is, we could rephrase our finding and say that since only 16.7 percent of men reported being vegan while 19.1 percent of women did, clearly women are more likely to be vegan based on this particular group of respondents.
Note that while proportions range from 0 to 1 and typically get rounded up to three digits after the decimal point (e.g., 0.167 and 0.191), percentages range from 0 to 100 and usually get rounded up to one or two digits after the decimal point (e.g., 16.7% and 19.1%). Also note that differences in percentages are expressed in percentage points, not in percent: in the current example, the difference between men and women who eat vegan is (19.1% – 16.7%=) 2.4 percentage points in favour of women being vegan, not 2.4 percent.
A final way to express relative frequencies are ratios, where a ratio is simply one frequency/count relative to another:
Using the numbers from the Watch Out!! #3 above, we can say that in the group of 170 respondents (102 men and 68 women), we have a men-to-women ratio of 1.5 — or, men in the study outnumber women by 1.5 to 1 since
.
It’s easy to see that if we want the women-to-men ratio, we only need to switch the numerator and denominator of the ratio:
This still tells us that men outnumber women as for every 1 man there is only a “0.7 woman”. Since this type of fractions, depending on the context, can lead to an awkward phrasing (like in this case), you may choose to report a ratio in the way most apt for easier interpretation.
Relative frequencies are all nice and good, but let’s go back to our main quest, the frequency table. Since we established that reported actual numbers are meaningless for comparison purposes and that we need relative frequencies to do that, it would only make sense to add a relative frequency column to our educational attainment Table 2.1 from Example 2.2 (B).
The percentages in Table 2.2 below have all been calculated using the steps described above: 1) obtain proportion, and 2) multiply by a 100. For example, only one of ours original 21 respondents had no degree. Then the percentage of the 21 respondents with no degree is:
The rest of the categories’ percentages are calculated in the same vain.
Example 2.2 (C) Hypothetical Data on Educational Attainment, Organized and with Relative Frequencies Added
Table 2.2 Educational Attainment by Frequency and Percent
Degree |
Frequency |
Percent |
No degree | 1 | 4.7 |
Secondary/High School | 6 | 28.6 |
Associate’s | 3 | 14.3 |
Bachelor’s | 5 | 23.8 |
Master’s | 2 | 9.5 |
PhD | 1 | 4.7 |
Didn’t answer | 3 | 14.3 |
TOTAL | 21 | 100.0 |
This way we can easily see how the respondents are distributed across the different educational attainment categories and each category’s share as a fraction of the total. If we had another group of respondents, we could easily compare between our initial group of 21 and the second hypothetical group by using the percentages above. Or can we?
- After all, percent or per cent comes from the Latin "per centum", meaning "by a hundred"; i.e., whatever proportion you are expressing, standardized by a hundred. ↵