{"id":247,"date":"2023-12-21T11:19:35","date_gmt":"2023-12-21T16:19:35","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/?post_type=chapter&#038;p=247"},"modified":"2024-06-26T15:29:06","modified_gmt":"2024-06-26T19:29:06","slug":"the-central-limit-theorem","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/chapter\/the-central-limit-theorem\/","title":{"raw":"The Central Limit Theorem and Sampling Distributions","rendered":"The Central Limit Theorem and Sampling Distributions"},"content":{"raw":"<div class=\"textbox textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Learning Objectives<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nIn this section, you will learn about:\r\n<ul>\r\n \t<li>The Central Limit Theorem<\/li>\r\n \t<li>Sampling Distributions<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<h2>The Central Limit Theorem<\/h2>\r\nThe Central Limit Theorem states that when a sample is sufficiently big:\r\n<ul>\r\n \t<li>The distribution of the sample means (i.e., the distribution of the <em>x <\/em>'s) is normally distributed about the true population mean [latex]\\mu[\/latex].<\/li>\r\n \t<li>This distribution is called the <a href=\"https:\/\/www.khanacademy.org\/math\/statistics-probability\/sampling-distributions-library\">sampling distribution<\/a> (see more below).<\/li>\r\n \t<li>The standard deviation of the sample means, called the standard error, is: [latex]\\sigma_{\\bar{x}}=\\frac{\\sigma}{\\sqrt{n}}[\/latex]<\/li>\r\n \t<li>The z-score (standard deviations away from the mean) for sampling distributions is: [latex] z = \\frac{\\bar{x} -\\mu}{\\sigma_{\\bar{x}}}[\/latex]<\/li>\r\n<\/ul>\r\n<h2>The Sampling Distribution<\/h2>\r\nIf we were to take sample after sample (of a large enough sample size) from the population, <strong><em>the distribution of the sample means <\/em><\/strong>(i.e., the distribution of all the <em>x<\/em>'s) would form a normal distribution about the true population average. We call this distribution the <strong><em>sampling <\/em><\/strong><strong><em>distribution.\u00a0<\/em><\/strong>See Figure 45.1 below to better understand this.\r\n\r\n[caption id=\"attachment_2197\" align=\"aligncenter\" width=\"537\"]<a style=\"font-weight: bold;font-size: 14pt\" href=\"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-content\/uploads\/sites\/2128\/2023\/12\/SamplingDistributions.jpg\"><img class=\"wp-image-2197 size-full\" src=\"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-content\/uploads\/sites\/2128\/2023\/12\/SamplingDistributions.jpg\" alt=\"Image with three different populations. The sample size is increased for each and the shape of the distribution starts as non-normal for small sample sizes and becomes normal for n equal to 30.\" width=\"537\" height=\"513\" \/><\/a> Figure 45.1 The shape of the sampling distributions becomes Normal as n increases.[\/caption]\r\n<h1>Comments on Sampling Distributions<\/h1>\r\nFigure 45.1 shows that in selecting simple random samples of size <em>n <\/em>from various populations:\r\n<ul>\r\n \t<li>The sampling distribution of the sample means (the <em>x <\/em>'s) can be approximated by a normal probability distribution as the sample size becomes sufficiently large.<\/li>\r\n \t<li>Historically, 'sufficiently large' was deemed to be thirty or more. That might not be large enough in some instances.<\/li>\r\n<\/ul>\r\n<h2>Further Comments on Sampling Distributions<\/h2>\r\n<ul>\r\n \t<li>Note that the original populations I, II and Ill have very different distributions yet the sampling distribution of the sample means (the <em>x <\/em>'s) are normally distributed about the true population average.<\/li>\r\n \t<li>Why this is so amazing is because this will allow us to use the normal distribution to estimate the true characteristics of the underlying population.<\/li>\r\n \t<li>The Central Limit Theorem (CLT) forms the underlying theory behind our next two topics and chapters in this text - Estimation and Hypothesis Testing<\/li>\r\n<\/ul>\r\n<h1>Describing the Sampling Distribution<\/h1>\r\nWe can describe the sampling distribution by its shape, its mean and standard deviation. Figure 45.1 shows the sampling distribution of sample means whereby its mean is defined by [latex]\\mu_{\\bar{x}}[\/latex] and its standard deviation is defined by the standard error [latex]\\sigma_{\\bar{x}}[\/latex].\r\n<h2>Notation<\/h2>\r\nBoth the mean and the standard deviation of the sampling distribution use subscripts of [latex]\\bar{x}[\/latex]<em>. <\/em>As shown in Figure 45.1:\r\n<ul>\r\n \t<li>The mean of the sampling distribution [latex]\\mu_{\\bar{x}}[\/latex] is located about the mean of the true average of the population.<\/li>\r\n \t<li>Thus, we can say that [latex]\\mu_{\\bar{x}}= \\mu[\/latex]<\/li>\r\n \t<li>The variability of the sampling distribution is determined by the standard error [latex]\\sigma_{\\bar{x}}[\/latex]-<\/li>\r\n \t<li>This is the standard deviation of the sample means.<\/li>\r\n \t<li>The standard error is equal to the following: [latex]\\sigma_{\\bar{x}}=\\frac{\\sigma}{\\sqrt{n}}[\/latex]<\/li>\r\n<\/ul>\r\n<h1>Effects of Increasing the Sample Size<\/h1>\r\n<ul>\r\n \t<li>The sampling distribution becomes more and more bell shaped.<\/li>\r\n \t<li>The standard error, [latex]\\sigma_{\\bar{x}}=\\frac{\\sigma}{\\sqrt{n}}[\/latex], decreases.<\/li>\r\n \t<li><span style=\"text-align: initial\">The sampling distribution gets<\/span><span style=\"text-align: initial\"> narrower.<\/span><\/li>\r\n \t<li><span style=\"text-align: initial\">In other words, our estimate of the true average gets more precise as our sample size increases. <\/span><\/li>\r\n \t<li>Or, the mean of the sampling distribution, [latex]\\mu_{\\bar{x}}[\/latex], approaches the population mean.<\/li>\r\n<\/ul>\r\n<h2>Why do we not always take a large sample size?<\/h2>\r\n<span style=\"text-align: initial\">One may ask why not take a larger sample size then? The answer to this is twofold: <\/span>\r\n<ol>\r\n \t<li><span style=\"text-align: initial\">First, as pointed out in previous sections, this would defeat the purpose of sampling to begin with as it would cost more time and money. <\/span><\/li>\r\n \t<li><span style=\"text-align: initial\">Secondly, the reduction in the standard error is not directly proportional to an increase in the sample size. The standard error decreases to a value proportional to [latex]\\frac{1}{\\sqrt{n}}[\/latex]<\/span><em style=\"text-align: initial\">. <\/em><span style=\"text-align: initial\">Thus, in order to reduce our standard error by half, we must increase our sample size by a magnitude of 4 times its size (not 2!). Likewise, to reduce our error by two-thirds, we need to increase the sample size by an order of 9.<\/span><\/li>\r\n<\/ol>\r\n<h1>Sampling Distribution Parameters<\/h1>\r\nWe can describe the sampling distribution by its shape, mean and standard deviation (known also as the \"standard error\"). We know from the CLT, that the shape of the sampling distribution will be normally distributed about the true population when the sample size is 30 or more.\u00a0 The mean of the sampling distribution [latex]\\mu_{\\bar{x}}[\/latex]is equal to [latex]\\mu[\/latex] (the population average). The standard error (the [latex]\\sigma[\/latex] of the sample means) is equal to:\r\n\r\n\\[\\sigma_{\\bar{x}}=\\frac{\\sigma}{\\sqrt{n}}\\]\r\n<div>\r\n\r\n(we will call this \"formula 10.1\")\r\n\r\n<\/div>\r\nWe can now update our formula for calculating the number of z-scores when dealing with a sampling distribution question:\r\n\r\n\\[z = \\frac{\\bar{x} -\\mu}{\\sigma_{\\bar{x}}}\\]\r\n\r\n(we will call this \"formula 10.2\")\r\n\r\nNote that know we have both the sample mean and the population mean in the same question. Thus we need to clearly distinguish information about the sample from information about the population that is provided in the question.\r\n\r\nIn the next section we will demonstrate how the sampling distribution is used in various calculations.\r\n<h1>Three Reasons the Normal Distribution is Important<\/h1>\r\nAs discussed in the previous Chapter, there are three reasons why it is important for us to study the Normal Distribution:\r\n<ol>\r\n \t<li>It naturally occurs in business and engineering.<\/li>\r\n \t<li>Sometimes, it is simply easier to use to calculate other probability distributions (i.e., approximating a binomial distribution with a normal curve).<\/li>\r\n \t<li>All sample averages, no matter what distribution they are sampled from, become normally distributed for large enough samples (according to the Central Limit Theorem).<\/li>\r\n<\/ol>\r\n<h1>Key Takeaways (EXERCISE)<\/h1>\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Key Takeaways: An Introduction to Sampling<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nDrag the words into the correct boxes for each section below:\r\n\r\n[h5p id=\"164\"]\r\n\r\n[h5p id=\"165\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<h1>Your Own Notes (EXERCISE)<\/h1>\r\n<ul>\r\n \t<li>Are there any notes you want to take from this section? Is there anything you'd like to copy and paste below?<\/li>\r\n \t<li>These notes are for you only (they will not be stored anywhere)<\/li>\r\n \t<li>Make sure to download them at the end to use as a reference<\/li>\r\n<\/ul>\r\n[h5p id=\"16\"]","rendered":"<div class=\"textbox textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Learning Objectives<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>In this section, you will learn about:<\/p>\n<ul>\n<li>The Central Limit Theorem<\/li>\n<li>Sampling Distributions<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h2>The Central Limit Theorem<\/h2>\n<p>The Central Limit Theorem states that when a sample is sufficiently big:<\/p>\n<ul>\n<li>The distribution of the sample means (i.e., the distribution of the <em>x <\/em>&#8216;s) is normally distributed about the true population mean [latex]\\mu[\/latex].<\/li>\n<li>This distribution is called the <a href=\"https:\/\/www.khanacademy.org\/math\/statistics-probability\/sampling-distributions-library\">sampling distribution<\/a> (see more below).<\/li>\n<li>The standard deviation of the sample means, called the standard error, is: [latex]\\sigma_{\\bar{x}}=\\frac{\\sigma}{\\sqrt{n}}[\/latex]<\/li>\n<li>The z-score (standard deviations away from the mean) for sampling distributions is: [latex]z = \\frac{\\bar{x} -\\mu}{\\sigma_{\\bar{x}}}[\/latex]<\/li>\n<\/ul>\n<h2>The Sampling Distribution<\/h2>\n<p>If we were to take sample after sample (of a large enough sample size) from the population, <strong><em>the distribution of the sample means <\/em><\/strong>(i.e., the distribution of all the <em>x<\/em>&#8216;s) would form a normal distribution about the true population average. We call this distribution the <strong><em>sampling <\/em><\/strong><strong><em>distribution.\u00a0<\/em><\/strong>See Figure 45.1 below to better understand this.<\/p>\n<figure id=\"attachment_2197\" aria-describedby=\"caption-attachment-2197\" style=\"width: 537px\" class=\"wp-caption aligncenter\"><a style=\"font-weight: bold;font-size: 14pt\" href=\"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-content\/uploads\/sites\/2128\/2023\/12\/SamplingDistributions.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2197 size-full\" src=\"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-content\/uploads\/sites\/2128\/2023\/12\/SamplingDistributions.jpg\" alt=\"Image with three different populations. The sample size is increased for each and the shape of the distribution starts as non-normal for small sample sizes and becomes normal for n equal to 30.\" width=\"537\" height=\"513\" srcset=\"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-content\/uploads\/sites\/2128\/2023\/12\/SamplingDistributions.jpg 537w, https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-content\/uploads\/sites\/2128\/2023\/12\/SamplingDistributions-300x287.jpg 300w, https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-content\/uploads\/sites\/2128\/2023\/12\/SamplingDistributions-65x62.jpg 65w, https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-content\/uploads\/sites\/2128\/2023\/12\/SamplingDistributions-225x215.jpg 225w, https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-content\/uploads\/sites\/2128\/2023\/12\/SamplingDistributions-350x334.jpg 350w\" sizes=\"auto, (max-width: 537px) 100vw, 537px\" \/><\/a><figcaption id=\"caption-attachment-2197\" class=\"wp-caption-text\">Figure 45.1 The shape of the sampling distributions becomes Normal as n increases.<\/figcaption><\/figure>\n<h1>Comments on Sampling Distributions<\/h1>\n<p>Figure 45.1 shows that in selecting simple random samples of size <em>n <\/em>from various populations:<\/p>\n<ul>\n<li>The sampling distribution of the sample means (the <em>x <\/em>&#8216;s) can be approximated by a normal probability distribution as the sample size becomes sufficiently large.<\/li>\n<li>Historically, &#8216;sufficiently large&#8217; was deemed to be thirty or more. That might not be large enough in some instances.<\/li>\n<\/ul>\n<h2>Further Comments on Sampling Distributions<\/h2>\n<ul>\n<li>Note that the original populations I, II and Ill have very different distributions yet the sampling distribution of the sample means (the <em>x <\/em>&#8216;s) are normally distributed about the true population average.<\/li>\n<li>Why this is so amazing is because this will allow us to use the normal distribution to estimate the true characteristics of the underlying population.<\/li>\n<li>The Central Limit Theorem (CLT) forms the underlying theory behind our next two topics and chapters in this text &#8211; Estimation and Hypothesis Testing<\/li>\n<\/ul>\n<h1>Describing the Sampling Distribution<\/h1>\n<p>We can describe the sampling distribution by its shape, its mean and standard deviation. Figure 45.1 shows the sampling distribution of sample means whereby its mean is defined by [latex]\\mu_{\\bar{x}}[\/latex] and its standard deviation is defined by the standard error [latex]\\sigma_{\\bar{x}}[\/latex].<\/p>\n<h2>Notation<\/h2>\n<p>Both the mean and the standard deviation of the sampling distribution use subscripts of [latex]\\bar{x}[\/latex]<em>. <\/em>As shown in Figure 45.1:<\/p>\n<ul>\n<li>The mean of the sampling distribution [latex]\\mu_{\\bar{x}}[\/latex] is located about the mean of the true average of the population.<\/li>\n<li>Thus, we can say that [latex]\\mu_{\\bar{x}}= \\mu[\/latex]<\/li>\n<li>The variability of the sampling distribution is determined by the standard error [latex]\\sigma_{\\bar{x}}[\/latex]&#8211;<\/li>\n<li>This is the standard deviation of the sample means.<\/li>\n<li>The standard error is equal to the following: [latex]\\sigma_{\\bar{x}}=\\frac{\\sigma}{\\sqrt{n}}[\/latex]<\/li>\n<\/ul>\n<h1>Effects of Increasing the Sample Size<\/h1>\n<ul>\n<li>The sampling distribution becomes more and more bell shaped.<\/li>\n<li>The standard error, [latex]\\sigma_{\\bar{x}}=\\frac{\\sigma}{\\sqrt{n}}[\/latex], decreases.<\/li>\n<li><span style=\"text-align: initial\">The sampling distribution gets<\/span><span style=\"text-align: initial\"> narrower.<\/span><\/li>\n<li><span style=\"text-align: initial\">In other words, our estimate of the true average gets more precise as our sample size increases. <\/span><\/li>\n<li>Or, the mean of the sampling distribution, [latex]\\mu_{\\bar{x}}[\/latex], approaches the population mean.<\/li>\n<\/ul>\n<h2>Why do we not always take a large sample size?<\/h2>\n<p><span style=\"text-align: initial\">One may ask why not take a larger sample size then? The answer to this is twofold: <\/span><\/p>\n<ol>\n<li><span style=\"text-align: initial\">First, as pointed out in previous sections, this would defeat the purpose of sampling to begin with as it would cost more time and money. <\/span><\/li>\n<li><span style=\"text-align: initial\">Secondly, the reduction in the standard error is not directly proportional to an increase in the sample size. The standard error decreases to a value proportional to [latex]\\frac{1}{\\sqrt{n}}[\/latex]<\/span><em style=\"text-align: initial\">. <\/em><span style=\"text-align: initial\">Thus, in order to reduce our standard error by half, we must increase our sample size by a magnitude of 4 times its size (not 2!). Likewise, to reduce our error by two-thirds, we need to increase the sample size by an order of 9.<\/span><\/li>\n<\/ol>\n<h1>Sampling Distribution Parameters<\/h1>\n<p>We can describe the sampling distribution by its shape, mean and standard deviation (known also as the &#8220;standard error&#8221;). We know from the CLT, that the shape of the sampling distribution will be normally distributed about the true population when the sample size is 30 or more.\u00a0 The mean of the sampling distribution [latex]\\mu_{\\bar{x}}[\/latex]is equal to [latex]\\mu[\/latex] (the population average). The standard error (the [latex]\\sigma[\/latex] of the sample means) is equal to:<\/p>\n<p>\\[\\sigma_{\\bar{x}}=\\frac{\\sigma}{\\sqrt{n}}\\]<\/p>\n<div>\n<p>(we will call this &#8220;formula 10.1&#8221;)<\/p>\n<\/div>\n<p>We can now update our formula for calculating the number of z-scores when dealing with a sampling distribution question:<\/p>\n<p>\\[z = \\frac{\\bar{x} -\\mu}{\\sigma_{\\bar{x}}}\\]<\/p>\n<p>(we will call this &#8220;formula 10.2&#8221;)<\/p>\n<p>Note that know we have both the sample mean and the population mean in the same question. Thus we need to clearly distinguish information about the sample from information about the population that is provided in the question.<\/p>\n<p>In the next section we will demonstrate how the sampling distribution is used in various calculations.<\/p>\n<h1>Three Reasons the Normal Distribution is Important<\/h1>\n<p>As discussed in the previous Chapter, there are three reasons why it is important for us to study the Normal Distribution:<\/p>\n<ol>\n<li>It naturally occurs in business and engineering.<\/li>\n<li>Sometimes, it is simply easier to use to calculate other probability distributions (i.e., approximating a binomial distribution with a normal curve).<\/li>\n<li>All sample averages, no matter what distribution they are sampled from, become normally distributed for large enough samples (according to the Central Limit Theorem).<\/li>\n<\/ol>\n<h1>Key Takeaways (EXERCISE)<\/h1>\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Key Takeaways: An Introduction to Sampling<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>Drag the words into the correct boxes for each section below:<\/p>\n<div id=\"h5p-164\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-164\" class=\"h5p-iframe\" data-content-id=\"164\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"The Central Limit Theorem and Sampling Distributions Key Takeaways\"><\/iframe><\/div>\n<\/div>\n<div id=\"h5p-165\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-165\" class=\"h5p-iframe\" data-content-id=\"165\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"The Central Limit Theorem and Sampling Distributions Solutions\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h1>Your Own Notes (EXERCISE)<\/h1>\n<ul>\n<li>Are there any notes you want to take from this section? Is there anything you&#8217;d like to copy and paste below?<\/li>\n<li>These notes are for you only (they will not be stored anywhere)<\/li>\n<li>Make sure to download them at the end to use as a reference<\/li>\n<\/ul>\n<div id=\"h5p-16\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-16\" class=\"h5p-iframe\" data-content-id=\"16\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"Key takeaways, notes and comments from this section document tool.\"><\/iframe><\/div>\n<\/div>\n","protected":false},"author":883,"menu_order":4,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-247","chapter","type-chapter","status-publish","hentry"],"part":320,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/pressbooks\/v2\/chapters\/247","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/wp\/v2\/users\/883"}],"version-history":[{"count":25,"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/pressbooks\/v2\/chapters\/247\/revisions"}],"predecessor-version":[{"id":2259,"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/pressbooks\/v2\/chapters\/247\/revisions\/2259"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/pressbooks\/v2\/parts\/320"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/pressbooks\/v2\/chapters\/247\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/wp\/v2\/media?parent=247"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/pressbooks\/v2\/chapter-type?post=247"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/wp\/v2\/contributor?post=247"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/1130sandbox\/wp-json\/wp\/v2\/license?post=247"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}