Chapter 6 Sampling, the Basis of Inference

6.7.1 Additional Confidence Intervals Considerations

Precision vs. certainty. One thing you you might have noticed from the calculations in the examples in the previous section is that the more certainty you get, the larger your confidence interval becomes (or vice versa: the smaller the interval, the less precise your estimate):

 

Based on the annual income details from Example 6.4, we had

  • between $49,700 and $50,300 with 68% confidence;
  • between $49,505 and $50,495 with 90% confidence;
  • between $49,412 and $50,588 with 95% confidence; and
  • between $49,226 and $50,774 with 99% confidence.

 

Of course, who wouldn’t want both more precise and more certain estimates? Unfortunately there simply is no way to have our cake and eat it too: As you can see above, the more confident in our estimate we get, the more the error bounds of the confidence intervals spread out wider. There is a trade-off between precision and confidence. The more precise our estimate, the less certain we are of it; the more confident we are in our estimate, the less precise our “guess” is.

 

Logically, this makes a lot of sense: imagine the population parameter as a target and estimation as throwing a dart at it. The smaller the target, the more precise you’ll have to be but also the less confident of hitting it. At the same time, increasing the target size will accommodate less precise “shots” while simultaneously increasing the certainty of the target being hit.

 

And why can’t we have a 100% CI? The non-technical answer is simply because a statistical estimator is based on a sample drawn from a population of interest: as long as you don’t have data from your entire population, there will always be a possibility for random error (and uncertainty).

 

The more technical answer lies in the characteristics of the normal probability distribution. Specifically, the normal curve approaches but never reaches the horizontal axis; the probability in its “tails” is not bound — i.e., a probability for any z-value exists, no matter how small or large, and it never reaches 0. Thus, a 100% confidence interval would result in -∞ to +∞ , i.e., it would be virtually infinitely large, to accommodate the perfect certainty. Logically, no bound, finite interval can provide 100% certainty by the nature of statistical inference itself. (Since, at 100%, it would stop being inference altogether: we will have no need to estimate, as we would know.)

 

The effect of sample size on confidence intervals. Let’s also consider the effect of sample size on the precision and level of certainty of confidence intervals. In Section 6.5 (https://pressbooks.bccampus.ca/simplestats/chapter/6-5-the-sampling-distribution/)  I attempted to convince you that increasing the sample size beyond a specific (large) number becomes not only unfeasible in a world of limited resources but also statistically pointless. Let’s see if I could further support my claim by the effect of sample size on the standard error.

 

To recall, we find the standard error in the following way:

 

\sigma_\overline{x} =\frac{\sigma}{\sqrt{N}}

 

where we estimate σ (the standard deviation of the population) with s (the standard deviation of the sample) to get

 

\hat\sigma_\overline{x} =s_\overline{x} =\frac{s}{\sqrt{N}}

 

We already established that a larger N would result in a smaller standard error (as N is in the denominator). Given the formula for calculating confidence intervals, a smaller standard error should in turn lead to smaller intervals (i.e., to more precise estimates) at a fixed level of certainty. The question is — how much smaller?

 

Example 6.5 The Effect of Sample Size on Confidence Intervals

Going back to our Average Annual Income (Example 6.4) specifications, we had that

 

N=1600

\overline{x}=50000

s=12000

 

We had also already calculated its 95% CI:

  • 95% CI: \overline{x}\pm1.96\times\hat\sigma_\overline{x} =50000\pm1.96\times300=50000\pm588= (49412; 50588).

What would happen if we increased the sample size to, say, N=10,000?

 

As usual, we start with calculating the standard error:

 

\hat\sigma_\overline{x} =s_\overline{x} =\frac{s}{\sqrt{N}}= \frac{12000}{\sqrt{10000}}=\frac{12000}{100}=120

 

Then, the new 95% CI would be

  • 95% CI: \overline{x}\pm1.96\times\hat\sigma_\overline{x} =50000\pm1.96\times120=50000\pm235= (49765; 50235).

To be sure, the larger-N confidence interval is smaller; we did gain precision. But consider these numbers for what they actually are, in actual dollar terms, had this been a real-life research instead of a hypothetical example. With a sample of N=1,600 we found that, with 95% certainty, the average annual income for the population is between $49,412 and $50,588. We now find that had we a sample of N=10,000, the average annual income of the population would be between $49,765 and $50,235.

 

The precision “gain” between the two sample sizes is $353 on each error bound; i.e., our estimate of average annual income of the population becomes ±$353 more precise (a total “gain” of $706). At the same time, consider that surveying a sample size of N=10,000 would cost more than six times more than surveying one of N=1,600 (as 10,000 is 6.25 times more than 1,600). Would this be worth it, to only be able to improve your estimate by $350, give or take, on both sides, when the actual sums we are dealing with are in the tens of thousands dollars magnitude?

 

Most people would agree that $49,412 to $50,588 is precise enough, and that there’s no need to waste six times more resources on such a relatively insignificant gain in precision when it comes to average annual income[1].

 

Bear in mind, however, that had we been discussing effectiveness of a life-saving medical treatment instead of average annual income, our preferences regarding the trade-off between precision and cost would most likely be different. Thus, the actual value of increasing sample size cannot be judged solely on statistics grounds: what is considered a small/insignificant change in precision for one thing may very well be a large and worthy change in another context. Still, in social science research there’s rarely a need for such increasing precision of inference no matter the costs, even if larger samples are generally preferred[2].


  1. To demonstrate the effect of sample size only, this example keeps the other conditions (i.e., the sample mean and standard deviation) the same. Arguably, however, a larger N would have a mean and a standard deviation "truer" to the population. To the extent that a larger sample ends up with a smaller standard deviation, the standard error would be further reduced, and the confidence interval would be even narrower, thus gaining more precision. Still, the point of the effect of sample size per se remains.
  2. Large sample sizes are very useful for gaining power in detecting associations between variables, as you'll see in the remaining chapters.

License

Simple Stats Tools Copyright © by Mariana Gatzeva. All Rights Reserved.

Share This Book