Confidence Intervals

Confidence Intervals for Proportions

Learning Objectives

In this section, we will construct confidence intervals to estimate true population proportions as well as determine required sample sizes to reduce the margin of error below a certain limit.

Constructing Proportion Confidence Intervals

When dealing with trying to understand the true percentage or fraction of population, we will be estimating the true proportion of a population ([latex]p[/latex]). The calculations we will need to perform are the following:

  • Sample proportion: [latex]\bar{p} =\frac{x}{n}[/latex]
  • Sample standard deviation: [latex]\sigma_{\bar{p}}=\sqrt{\frac{\bar{p}(1-\bar{p})}{n}}[/latex]
  • Standard error: [latex]E = z \cdot \sqrt{\frac{\bar{p}(1-\bar{p})}{n}}= z \cdot \sigma_{\bar{p}}[/latex]
  • [latex]z[/latex]-score: [latex]z = \text{NORM.S.INV}(\frac{\alpha}{2})[/latex]

We can now construct the confidence interval:

  • Confidence interval lower limit: [latex]CL_{Lower} =\bar{p} - z \cdot \sqrt{\frac{\bar{p}(1-\bar{p})}{n}} = \bar{p} - E[/latex]
  • Confidence interval upper limit: [latex]CL_{Upper} =\bar{p} + z \cdot \sqrt{\frac{\bar{p}(1-\bar{p})}{n}}= \bar{p} + E[/latex]

Note: We will be using [latex]z[/latex]-scores once again when dealing with proportions which is similar to the “sigma known” estimation problems.

Calculating Required Sample Sizes

If we are given or want a maximum margin of error, we can calculate the sample size required to achieve this, given a percent confidence level. The formula is slightly different if we do not have a sample proportion or any idea of what the population proportion is (we use 50% as estimate).

  • Sample size if [latex]\bar{p}[/latex] is known: [latex]n =\left(\frac{z}{E}\right)^2 \bar{p}(1-\bar{p})[/latex]
  • Sample size if [latex]\bar{p}[/latex] is unknown: [latex]n =\left(\frac{z}{E}\right)^2 (0.5)(1-0.5)=0.25\left(\frac{z}{E}\right)^2[/latex].

Distribution of Population proportions

Once again we are relying on our understanding of the sampling distribution to estimate the true proportion of the population. As we increase the sample size, the distribution of sample proportions ([latex]\bar{p}[/latex]) becomes more and more normally distributed. So, as long as the sample size is sufficiently large, we can assume that the distribution of sampling proportions is normally distributed.

We now place p at the center of our sampling distribution. The variability of our sampling distribution is defined by the sample standard deviation formula above.

We will now present a numerical problem for estimation of the proportion of a population.

Example for this Section (You Try)

Example 54.1

In a random sample of 40 users of a new software program, 28 said that they liked using it over the older program.

    1. With 99% confidence, what can we say about the Maximum size of our error (i.e., the margin of error) in estimating the true percentage of people who prefer the new software program?
    2. Develop a 99% confidence interval estimate of the population
    3. Using a 99% confidence level, how large a sample should be taken to obtain a margin of error for the estimation of the population proportion of 0.10?
    4. If we had no previous knowledge about the proportion of users who liked the new program, find the sample size that would be required in order to be 99% confident that the maximum error would be 0.10.

You Try: Can you try using the formulas at the top of this section to solve the questions above? The way to solve these problems is quite similar to the calculations required to build up confidence interval for means. So, do your best! Click to reveal any solutions you need if you get stuck or want to check your answer.

Excel solutions: Try doing these problems using excel or after you have tried them by hand, use Excel to get the z-score as well as do the calculations. At the end, click to reveal the answers to check your answers or if you get stuck on any of the parts.

Click here to reveal the Written solution to part a)

a) With 99% confidence, what can we say about the Maximum size of our error (i.e., the margin of error) in estimating the true percentage of people who prefer the new software program?

Note that we have underlined two important pieces of information in the problem to clearly identify the type of question we are presented with. First, the wordconfidencelets us know that this is an estimation problem and second true percentage” identifies the fact that we are dealing with the third area of estimation: the true proportion of a population.

The next question you need to ask yourself is what is the question asking us to solve? In other words, what does “the maximum size of our error” or the “margin of error” refer to? The margin of error refers to a value of “E” that represents the maximum distance from the center of the distribution to one confidence limit at the given confidence level of 99%. Thus we calculate the margin of error:

\[ E = z \cdot \sqrt{\frac{\bar{p}(1-\bar{p})}{n}} \]

Note that we must first calculate [latex]\bar{p}[/latex] (our point estimate of the true proportion of people who prefer the new software product). Information is provided in the question that out of 40 randomly selected people (i.e., our sample size n = 40), 28 preferred the new software product:

\[ \bar{p} = \frac{28}{40} = 0.70\]

Thus, a point estimate of the true proportion of the population who prefer the new software is 70%.  Also needed is a z-score that corresponds to the given confidence level of 99%. To determine the z-scores in the illustration, we use NORM.S.INV:

\[z =\text{abs}(\text{NORM.S.INV}(0.005)) = 2.576\]

We can calculate the margin of error:

[latex]\begin{align*} E &= z \cdot \sqrt{\frac{\bar{p}(1-\bar{p})}{n}}\\ &=2.576 \cdot \sqrt{\frac{0.7(1-0.7)}{40}}\\ &=0.18664 \end{align*}[/latex]

Thus we can be 99% confident that the Maximum error of estimate of the. true proportion will be 0.1866 or 18.66% (rounded to 2 decimal places). We can now place this value on our sampling distribution. Recall, this value represents the distance from the center of our diagram to each confidence limit. This tells us that we are 99% sure that the true proportion (of people who prefer the new software) will fall within 18.7% of our sample proportion of 70%.

Click here to reveal the Written solution to part b)

b) Develop a 99% confidence interval estimate of the population  

This question is asking us to calculate the interval estimate (i.e., from confidence limit to confidence limit) of the true population proportion. To calculate the confidence limits of the true proportion we use the formula:

\[CL =\bar{p} \pm z \cdot \sqrt{\frac{\bar{p}(1-\bar{p})}{n}}\]

Note that another way we could write the above formula is:

\[CL = \bar{p} \pm E = 0.7 \pm 0.1866\]

Thus the confidence limits are [latex]0.70 -0.1866 = 0.5134[/latex] and [latex]0.70 +0.1866 = 0.8866[/latex]

Final Statement:  Based on these results, I can be 99% confident that the true proportion of users (ie [latex]\pi[/latex] or [latex]p[/latex]) who would like the new program is between 51.34% and 86.66% of the population.

Click here to reveal the Written solutions to part c)

c) Using a 99% confidence level, how large a sample should be taken to obtain a margin of error for the estimation of the population proportion of 0.10?

Note that in the concluding statement to part b) there margin of error is quite large. We have found that the true proportion of users who may like the software may actually fall between 51.3% and 87% of the population. This question is asking us to find the required sample size that would enable us to reduce our margin of error to 10%. In other words, we would like to estimate the proportion of users who would like the software and be within 10% of the true proportion of the population.

For calculating sample size for a proportion type question we use the formula:

\[n = \left(\frac{z}{E}\right)^2 \bar{p}(1-\bar{p})\]

This formula requires us to have knowledge of 3 variables:

  • E is the margin of error that is typically supplied in the question. In part c) it is provided as 10%.
  • is a z-score that can be derived by knowing the confidence level (again provided within the question).   Once again we have a 99% confidence level and therefore we know the corresponding z-score is 2.576.
  • [latex]\bar{p}[/latex] : the point estimate of the true proportion of the population.

There is a bit of a circular argument. It is asking us for [latex]\bar{p}[/latex] to calculate the sample size and yet we will be using the sample to calculate [latex]\bar{p}[/latex]! The rule of thumb for calculating the sample size required for a proportion is to use any prior estimate of the proportion if possible. For example, there may be a prior estimate from a pilot study or previous survey that relates to the current investigation. Of course, if it does not relate to the current investigation it should not be used. As in this example, we have a prior point estimate of 70% of the proportion of users who like the software product over the older version. Thus we may use [latex]\bar{p}=0.7[/latex] in our calculation:

[latex]\begin{align} n &= \left(\frac{z}{E}\right)^2 \bar{p}(1-\bar{p})\\ & = \left(\frac{2.576}{0.10}\right)^2 0.7(1-0.7)\\ &= 139.33 \end{align}[/latex]

As we saw in the large sample case, when calculating the required sample size we always round up to the next higher integer value. Thus, in order to be 99% certain that the true proportion will be within 10% of our sample proportion, we need to have a sample size of 140 software users in our study.

Click here to reveal the written solutions to part d)

d) If we had no previous knowledge about the proportion of users who liked the new program, find the sample size that would be required in order to be 99% confident that the maximum error would be 10%

This question relates to the previous one whereby we don’t have a prior estimate of [latex]\bar{p­}[/latex] to calculate the sample size. If no prior estimate of [latex]\bar{p­}[/latex] is available, the default is to use 0.50 in formula #10. The reason for this is because a value of 0.50 will generate the largest sample size possible for a given margin of error and confidence level. Thus here we use [latex]\bar{p}= 0.50[/latex] in our calculation:

[latex]\begin{align*} n &= \left(\frac{z}{E}\right)^2 \bar{p}(1-\bar{p})\\ & = \left(\frac{2.576}{0.10}\right)^2 \cdot  0.5 (1-0.5)\\ &= 165.77 \end{align*}[/latex]

Again, we always round up to the next higher integer value when calculating sample size. Thus, in order to be 99% certain that the true proportion will be within 10% of our sample proportion (whereby there is no prior estimate of our sample proportion), we need to have a sample size of 166 software users in our study.

Click here to reveal the Excel solutions for this Example

Click here to download the Excel solutions shown below.

Example 54.1 Excel Solutions
part a) Values Excel Formula
conf level = 99%
alpha = 1% =1-B3
a/2 = 0.005 =B4/2
z = 2.575829 =ABS(NORM.S.INV(B5))
x_bar = 28
n = 40
pbar = 0.7 =B7/B8
1-p_bar = 0.3 =1-B9
sigma = 0.072457 =SQRT(B9*B10/B8)
E = 0.186637 =B11*B6
part b)
CL_Lower = 0.513363 =B9-B12
CL_Upper = 0.886637 =B9+B12
part c)
E_new = 0.1
n_new = 139.3328 =(B6/B19)^2*B9*B10
answer = 140 =ROUNDUP(B20,0)
part d)
p_new = 0.5
q_new = 0.5 =1-B24
E_new = 0.1
n_new = 165.8724 =(B6/B26)^2*B24*B25
answer = 166 =ROUNDUP(B27,0)

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

An Introduction to Business Statistics for Analytics (1st Edition) Copyright © 2024 by Amy Goldlist; Charles Chan; Leslie Major; Michael Johnson is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book