Chapter 6 Sampling, the Basis of Inference
6.7.2 Confidence Intervals for Proportions
Just like we may like to know the population mean of something (like the average annual income above), we might want to know the population proportion of something else (like, say, the proportion of Canadians working part time). Population proportions are, like population means, parameters that can be estimated.
The principle of estimating a population proportion through a confidence interval is the same as estimating the mean — we need a standard error for creating error bounds around the sample statistic (in this case, the proportion).
The question, however, is how to calculate the standard error of a proportion. After all, the CI formula requires the use of a standard deviation; a standard deviation that proportions do not have (as the dispersion measures we studied are only applicable to interval/ratio data, if you recall from Section 4.4 (https://pressbooks.bccampus.ca/simplestats/chapter/4-4-standard-deviation/). Thus, calculating the mean and the standard deviation of an interval/ratio variable is all well and good but what do we do with proportions, considering that they relate to categories, not numerical values?
In fact, there is a way to measure dispersion in a binary distribution (i.e., where there are only two categories/outcomes, e.g., employed vs. unemployed, women vs. men, undergraduate vs. graduate students, heads vs. tails, approval vs. disapproval, yes vs. no, success vs. failure, etc.). Unlike interval/ratio variables (which usually have an approximately normal — and continuous –distribution), such a binary distribution is a discrete distribution.
Since the standard deviation is off the table, here is an example to demonstrate the logic underlying the measurement of variability of proportions.
Example 6.6 Variability Through Clothing
Imagine you have a friend who is partial to the colour black so much so that they always wear a monochromatic, all-black outfit. Then one day you notice your friend is wearing a single article of a different colour, say, dark purple. Arguably, that’s more variability than wearing all-black, but the outfit will still be predominantly black. Then on the next day, there are two pieces of purple amid all the black, then three, then four, and so on. At what point would your friend’s outfit stop being “predominantly black” and would become “predominantly purple”? And what would happen eventually, if the exchanging-black-for-purple trend continues?
The answer to the latter question is obvious: the end point of such a trend would be for the outfit to become monochromatic again, this time all-purple. Now think about variability. At what point was there the greatest and at what point was there the least amount of variability in your imaginary friend’s outfit?
To make it easier, let’s add a numerical aspect to what we have imagined, and say that your friend’s outfit consisted of 10 articles of clothing (and accessories) to start with, and then your friend swapped a black article for a purple article on each successive day, for ten days straight after that. Table 6.1 illustrates.
Table 6.1 Black and Purple Articles of Clothing
Black Articles | Purple Articles | |
Initial state | 10 | 0 |
Day 1 | 9 | 1 |
Day 2 | 8 | 2 |
Day 3 | 7 | 3 |
Day 4 | 6 | 4 |
Day 5 | 5 | 5 |
Day 6 | 4 | 6 |
Day 7 | 3 | 7 |
Day 8 | 2 | 8 |
Day 9 | 1 | 9 |
Day 10 | 0 | 10 |
Again, on what day(s) would your friend’s outfit be the least and the most variable in terms of colour? Looking at Table 6.1, it’s not difficult to spot that the least variable were your friend’s initial (all-black) outfit and what they wore on Day 10 (all-purple), both consisting of a single colour. There is a slight variability on Days 1 and 9 (when there was a single article of different colour); then more variability on Days 2 and 8 (when there were two articles of different colour); then even more variability on Days 3 and 7 (when your friend had three different-coloured articles); and yet even more variability on Days 4 and 6 (when there were four articles of different colour). The outfit was most variable on Day 5, when it was half-black and half-purple, neither colour predominating.
Going by “half-black and half-purple”, let’s restate the information in Table 6.1 in terms of proportions, as this will help us generalize the logic without the constraint of an actual count (of 10 articles of clothing, or anything else).
Table 6.2 (A) Black and Purple Articles of Clothing: Proportions
Black Articles | Purple Articles | |
Initial state | 1 | 0 |
Day 1 | 0.9 | 0.1 |
Day 2 | 0.8 | 0.2 |
Day 3 | 0.7 | 0.3 |
Day 4 | 0.6 | 0.4 |
Day 5 | 0.5 | 0.5 |
Day 6 | 0.4 | 0.6 |
Day 7 | 0.3 | 0.7 |
Day 8 | 0.2 | 0.8 |
Day 9 | 0.1 | 0.9 |
Day 10 | 0 | 1 |
One convenient way to quantify what we found in terms of the least and the largest variability is through multiplying the proportions in the two columns, like so:
Table 6.2 (B) Black and Purple Articles of Clothing: Variability
Black Articles | Purple Articles | Variability | |
Initial state | 1 | 0 | 1(0)=0 |
Day 1 | 0.9 | 0.1 | 0.9(0.1)=0.09 |
Day 2 | 0.8 | 0.2 | 0.8(0.2)=0.16 |
Day 3 | 0.7 | 0.3 | 0.7(0.3)=0.21 |
Day 4 | 0.6 | 0.4 | 0.6(0.4)=0.24 |
Day 5 | 0.5 | 0.5 | 0.5(0.5)=0.25 |
Day 6 | 0.4 | 0.6 | 0.4(0.6)=0.24 |
Day 7 | 0.3 | 0.7 | 0.3(0.7)=0.21 |
Day 8 | 0.2 | 0.8 | 0.2(0.8)=0.16 |
Day 9 | 0.1 | 0.9 | 0.1(0.9)=0.09 |
Day 10 | 0 | 1 | 0(1)=0 |
That is, starting from zero, variability is the highest at precisely the half-and-half point, when neither outcome/category (in our example, neither colour) predominates.
Now we are ready for the formula to measure the dispersion of a proportion. I demonstrate it by restating Table 6.2 (B), by designating black as 1 and purple as 0, and taking black as the colour of interest (i.e., all proportion will be expressed in terms of black).
Table 6,2 (C) Black and Purple Articles of Clothing: Generalized
Black Articles | Non-black Articles | Variability | |
Initial state | 1 | 0 | 1(0)=0 |
Day 1 | 0.9 | (1-09) | 0.9(1-0.9)=0.09 |
Day 2 | 0.8 | (1-0.8) | 0.8(1-0.8)=0.16 |
Day 3 | 0.7 | (1-0.7) | 0.7(1-0.7)=0.21 |
Day 4 | 0.6 | (1-0.6) | 0.6(1-0.6)=0.24 |
Day 5 | 0.5 | (1-0.5) | 0.5(1-0.5)=0.25 |
Day 6 | 0.4 | (1-0.4) | 0.4(1-0.4)=0.24 |
Day 7 | 0.3 | (1-0.3) | 0.3(1-0.3)=0.21 |
Day 8 | 0.2 | (1-0.2) | 0.2(1-0.2)=0.16 |
Day 9 | 0.1 | (1-0.1) | 0.1(1-0.1)=0.09 |
Day 10 | 0 | (1-0) | 0(1-0)=0 |
And there you have it in the Table 6.2 (C) above, the formula for calculating variability for a proportion (i.e., for a discrete binary variable). Since we denote sample proportions with p and population proportions with π, the variability of a proportion is given by multiplying the proportion of the outcome we’re interested in by 1 minus the proportion (i.e., on the other outcome’s proportion) — that is, we have p(1-p) for samples and π(1-π) for populations.
Technically speaking, this variability is the proportion’s variance:
As usual, to get the proportion’s standard deviation, we need a square root of the variance:
With this, we are finally ready to get back to calculating a confidence interval for a proportion, as we now have everything we need to calculate its standard error. If you recall, the formula for the standard error was:
Substituting the standard deviation of the proportion, we get:
Of course, when we don’t have the population standard deviation, we estimate it with the sample standard deviation — i.e., we need to substitute p for π:
Following our true and tested formula for confidence intervals (i.e., the sample statistic ± z the standard error), we ultimately get the confidence interval for a proportion:
- Any % CI:
As with the mean, we can calculate a confidence interval with any preferred level of certainty by substituting with the z-value associated with that probability. For example, the 95% confidence interval for the proportion would be:
- 95% CI:
If you find all this too technical and abstract, the following example should help.
Example 6.7 Part-Time Workers in Canada, Age 25-54
Let’s say we want to know what proportion of Canadian workers work part-time, and that we are especially interested in what Statistics Canada calls “the core ages” 25 to 54 (REFERENCE Statistics Canada, 2017 [https://www150.statcan.gc.ca/n1/pub/71-222-x/71-222-x2018002-eng.htm]).
We conduct a survey of N=1,600 Canadian individuals aged 25-54 and find that 12 percent of our respondents work part-time. As usual, we want to estimate the proportion of all Canadians aged 25-54 who work part time.
We start with calculating the standard error:
Then, a 95% confidence interval for the proportion would be:
- 95% CI:
Thus we estimate with 95% certainty that (i.e, 95% of the time such a study is undertaken it will find that) between 10.4% and 13.6% of the Canadian workers aged 25-54 work part-time. Alternatively, we can say with 95% certainty that 12% ±1.6 percentages points of Canadian workers aged 25-54 work part time.
As there is a lot to take in here, a second example is in order.
Example 6.8 Women in Managerial Positions
Let’s say a large, nationally-representative study of N=10,000 finds that women in Canada occupy 36 percent of managerial positions. [REFERENCE https://www.expertmarket.com/female-managers] What would be the estimate for Canada as a whole?
The estimated standard error of the proportion would be:
As in the previous examples, the 95% confidence interval for the proportion would be:
- 95% CI:
That is, we can estimate with 95% certainty (or, 95% of the time such a study is undertaken it will find) that between 35% and 37% of managerial positions in Canada are occupied by women. Alternatively, we can say with 95% certainty that women occupy 36% ±0.01 percentage points of managerial positions in Canada[1].
If you find this a bit too precise to believe, note the quite large sample size of N=10,000. As established above, confidence intervals based on large N and around proportions indicating not very strong variability (after all, the sample statistics indicated that managerial positions are predominantly occupied by men) tend to have small standard errors (due to the relatively small numerator (the variability) and the large denominator (the sample size)).
Now finally it’s your turn to try, first with means…
Do It! 6.1 Average Height of NHL Players, In Inches This Time
Let’s say that a random sample of N=900 past and present players in the National Hockey League finds that the average height of players is 73 inches, with a standard deviation of 3 inches. What can you say about the average height of NHL players as a whole? Construct a 95% and a 99% confidence intervals for the average height of NHL players.
Answer: (72.8; 73.2) and (72.7; 73.3), respectively.
… And now with proportions.
Do Itt! 6.2 Paying Off Student Debt Within Three Years After Graduation
Let’s say that a sample of N=1,600 finds that only 34 percent of Canadians with a bachelor’s degree have paid off their student loans within three years after graduation. Can you estimate the rate for all Canadians with a bachelor’s degree? Construct both a 95% and a 99% confidence interval for that rate.
Answer: (31.7; 36.3) and (31; 37), respectively.
To summarize, confidence intervals allow us to estimate population parameters with a specific level of precision and certainty. We construct them based on the idea of the (normally distributed) sampling distribution of the mean (or the proportion) using CLT’s postulates: centering the interval on the sample man (or proportion) and taking that many times the standard error below and above the mean (or proportion). The “how many times the standard error” (i.e., the z-score) determines the interval’s confidence (i.e., certainty in terms of probability) level.
Before we move on to variable associations (along with further uses of confidence intervals in statistics inference; you didn’t think it was just this, did you?), let’s finally address the glaring omission in my presentation so far: How come we can simply use the sample standard deviation s instead of the population standard deviation σ in calculating the standard error? I left that explanation for last, in the next section.
- In this chapter I have presented the most commonly used interpretation of confidence intervals, and the one most frequently taught to introductory statistics students. I should point out, however, that this is one of those instances (of which I spoke in the introduction to this book) where the reality is a bit different than what is being taught. The interpretation presented here is easier to understand and follows a logic that is more intuitive to students than what confidence intervals really tell us. Briefly, the range of plausible values we find are just that -- values that the population could have, as we haven't ruled them out yet, and 95% (or 99%) of the time such studies will not be able to rule these plausible values out (REFERENCE van der Zee, 2017 [How (Not) To Interpret Confidence Intervals, in the hyperlink]). This, technically speaking, is somewhat different than the "95% (or 99%) certainty that the population mean/proportion will be between the calculated error bounds" version we usually work with. If you'd like to go down that particular rabbit hole, go here: http://www.timvanderzee.com/not-interpret-confidence-intervals/. For everyone else, the interpretation of confidence intervals presented so far in this chapter should be enough. ↵