{"id":103,"date":"2018-10-31T17:44:51","date_gmt":"2018-10-31T21:44:51","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/simplestats\/?post_type=chapter&#038;p=103"},"modified":"2019-10-20T18:12:44","modified_gmt":"2019-10-20T22:12:44","slug":"6-8-the-t-distribution","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/6-8-the-t-distribution\/","title":{"raw":"6.8 The t-Distribution","rendered":"6.8 The t-Distribution"},"content":{"raw":"[latexpage]\r\n\r\nIf, having reached this chapter's final section, after all we had been through, random sampling, sampling distribution, CLT, parameters, estimates, statistics, confidence intervals, you are now groaning in dismay -- <em>why is there even more<\/em><span style=\"text-indent: 1em;font-size: 14pt\">\u00a0<em>to this topic??<\/em><\/span><span style=\"font-size: 14pt\">[footnote]As a general principle, in introductory texts such as this there is\u00a0<\/span><em style=\"font-size: 14pt\">always<\/em><span style=\"font-size: 14pt\">\u00a0more. Much, much more; it's not a matter\u00a0<\/span><em style=\"font-size: 14pt\">if<\/em><span style=\"font-size: 14pt\">\u00a0<\/span><span style=\"font-size: 14pt\">but of\u00a0<\/span><em style=\"font-size: 14pt\">how much\u00a0<\/em><span style=\"font-size: 14pt\">something is left out<\/span><span style=\"font-size: 14pt\">. [\/footnote]<\/span><span style=\"text-indent: 1em;font-size: 14pt\">\u00a0-- take heart, this is a short explanation I kept for last, through a brief introduction of new concept.<\/span>\r\n\r\n&nbsp;\r\n\r\nIf you recall, when we needed to calculate the standard error of the mean (or proportion) in the previous few sections, I simply replaced the <em>unknown<\/em> population standard deviation <em>\u03c3<\/em> with the <em>known<\/em> sample standard deviation <em>s<\/em> in the formula. This is what I did:\r\n\r\n&nbsp;\r\n\r\n$\\sigma_\\overline{x}$ $=\\frac{\\sigma}{\\sqrt{N}}=\\textrm{standard error of the mean}$\r\n\r\n&nbsp;\r\n\r\nSubstituting in <em>s\u00a0<\/em>for <em>\u03c3<\/em> we had:\r\n\r\n&nbsp;\r\n\r\n$\\hat\\sigma_\\overline{x}$ $=s_\\overline{x}$ $=\\frac{s}{\\sqrt{N}}=\\textrm{estimated standard error of the mean}$\r\n\r\n&nbsp;\r\n\r\nSimilarly, for the proportion we had\r\n\r\n&nbsp;\r\n\r\n$\\sigma_p=\\frac{\\sigma}{\\sqrt{N}}=\\frac{\\sqrt{\\pi(1-\\pi)}}{\\sqrt{N}}=\\sqrt{\\frac{\\pi(1-\\pi)}{N}}=\\textrm{standard error of the proportion}$\r\n\r\n&nbsp;\r\n\r\nand substituting\u00a0<span style=\"font-size: 14pt;text-indent: 18.6667px\">the known sample proportion\u00a0<\/span><em style=\"font-size: 14pt;text-indent: 18.6667px\">p <\/em>for\u00a0<span style=\"text-indent: 1em;font-size: 14pt\">the unknown population proportion\u00a0<\/span><em style=\"text-indent: 1em;font-size: 14pt\">\u03c0 <\/em>in calculating the proportion's variability<span style=\"text-indent: 1em;font-size: 14pt\">, we ended up with:<\/span>\r\n\r\n&nbsp;\r\n\r\n$\\hat\\sigma_p=\\frac{\\sigma}{\\sqrt{N}}=\\frac{\\sqrt{p(1-p)}}{\\sqrt{N}}=\\sqrt{\\frac{p(1-p)}{N}}=\\textrm{estimated\u00a0standard error of the proportion}$\r\n\r\n&nbsp;\r\n\r\nBut why can we do that?\r\n\r\n&nbsp;\r\n\r\nThe more observant of you might have noticed that I swept the explanation for this change under the carpet and simply moved on -- but why should the variability of the population be the same as the sample?\r\n\r\n&nbsp;\r\n\r\nIn truth, they are not -- or rather, they <em>might<\/em> be; there's just no way to know. That is, by using the sample statistics to estimate the variability of the population, we introduce more\u00a0<em>uncertainty\u00a0<\/em>in the calculation. When we do that, we actually move away from using the normal distribution and its associated <em>z<\/em>-values. What we end up using is something similar, called the <em>t-distribution<\/em>[footnote]Also called the <em>Student<\/em>'s <em>t<\/em>-distribution, after the pseudonym of William Gosset who introduced it to statistics (along with many other concepts). Due to contractual obligations, William Gosset used to publish under the name of \"Student\" (Pagels, 2018). Here you can find more about his curious case: <a href=\"https:\/\/medium.com\/value-stream-design\/the-curious-tale-of-william-sealy-gosset-b3178a9f6ac8.\">https:\/\/medium.com\/value-stream-design\/the-curious-tale-of-william-sealy-gosset-b3178a9f6ac8.<\/a>[\/footnote]: an entire set of bell-shaped curves, accounting for each and every sample size <em>N<\/em>. Figure 6.5 illustrates.\r\n\r\n&nbsp;\r\n\r\n<em>Figure 6.5\u00a0The Normal vs. the t-Distribution<\/em>\r\n\r\n<img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/03\/normal-vs-t-1.jpg\" alt=\"\" width=\"673\" height=\"274\" class=\"wp-image-936 aligncenter\" \/>\r\n\r\n&nbsp;\r\n<p style=\"text-indent: 18.6667px\"><strong>The <em>t<\/em>-distribution provides a separate bell-shaped curve for each possible sample size<\/strong>, thus helping us \"ground\", as it were, the estimation in the reality of an actual sample of a specific size.<\/p>\r\n&nbsp;\r\n\r\n<strong>The accommodation of the sample size is done through the concept of <em>degrees of freedom <\/em>(commonly abbreviated to <em>df<\/em>). The degrees of freedom represent the number of values in a statistical calculation that are free to vary. In the case of the <em>t<\/em>-distribution, the degrees of freedom are <em>N<\/em>-1 as one degree of freedom is reserved for estimating the mean, and <em>N<\/em>-1 degrees remain for estimating the variability.\u00a0<\/strong>Unlike with <em>z<\/em>-values, where each <em>z<\/em>-value represents a specific probability under the normal curve, the probabilities associates by <em>t<\/em>-values are calculated based on its degrees of freedom.\r\n\r\n&nbsp;\r\n\r\nStill, none of this explains why I was able to shamelessly switch from using the <em>z<\/em>-distribution to the <em>t<\/em>-distribution, without any change to the standard error and confidence interval calculations in the examples in the previous sections. If <em>z<\/em>-values and <em>t<\/em>-values (and their associated probabilities) are different, shouldn't the calculations differ too?\r\n\r\n&nbsp;\r\n\r\nBefore I reassure you that all is well (and it is), let's revisit what<em> z<\/em>-values actually represent. From Chapter 5 you know that the <em>z<\/em>-value is the distance between a case and the mean, expressed in terms standard deviations (i.e., standardized):\r\n\r\n&nbsp;\r\n\r\n$$z=\\frac{x_i-\\overline{x}}{s}$$\r\n\r\n&nbsp;\r\n\r\nThe reason we were able to use <em>z<\/em>=1,<em> z<\/em>=1.96, and <em>z<\/em>=2.58 in the calculations of the 68%, 95%, and 99% confidence intervals, respectively, was because the sampling distribution is a normal distribution (<span style=\"text-indent: 18.6667px;font-size: 14pt\">per the Central Limit Theorem)<\/span><span style=\"text-indent: 1em;font-size: 14pt\">. That is, the <em>z<\/em>-value in this case is the distance between the sample mean (the \"case\" in the sampling distribution) and the population mean (\"the mean of means\", the mean of the sampling distribution), expressed in standard errors (the \"standard deviation\" of the sampling distribution):<\/span>\r\n\r\n&nbsp;\r\n\r\n$$z=\\frac{\\overline{x}-\\mu}{\\sigma_\\overline{x}}$$ [footnote]where $\\sigma_\\overline{x}$ $=\\frac{\\sigma}{\\sqrt{N}}$.[\/footnote]\r\n\r\n&nbsp;\r\n\r\nNow what about <em>t<\/em>? By substituting the sample standard deviation for the population standard deviation, we end up with the <em>estimated<\/em> standard error. In turn, substituting the <em>estimated<\/em> standard error for the standard error in the formula for the <em>z<\/em>-value above, we get\u00a0the <em>t<\/em>-value, the distance between the sample mean and the population mean, expressed in <em>estimated<\/em> standard errors:\r\n\r\n&nbsp;\r\n\r\n$$t=\\frac{\\overline{x}-\\mu}{s_\\overline{x}}$$ [footnote]Where $s_\\overline{x}$ $=\\frac{s}{\\sqrt{N}}$.[\/footnote]\r\n\r\n&nbsp;\r\n\r\nCompare the two formulas for the\u00a0<em>z<\/em>-value and the <em>t<\/em>-value above. As similar as they look, the <em>t<\/em>-value is more \"uncertain\" than the <em>z<\/em>-value, and comes with the aforementioned specification of degrees of freedom. Given specific degrees of freedom, the shape of the <em>t<\/em>-distribution curve changes, and thus the probabilities associated with each <em>t<\/em>-value change too.\r\n\r\n&nbsp;\r\n\r\nFinally, for the drum roll: The reason I was able to work with <em>t<\/em>-values instead of <em>z<\/em>-values in the calculations of confidence intervals in the previous section without acknowledging it is due to the sample sizes I chose for my examples. See, <strong>the biggest difference between the <em>z<\/em> and the <em>t<\/em> happens with small <em>N\u00a0<\/em>(especially <em>N<\/em>&lt;30).\u00a0The larger the <em>N<\/em>, the closer and closer the <em>t<\/em>-distribution approaches the <em>z<\/em>-distribution.\u00a0<\/strong>\r\n\r\n&nbsp;\r\n\r\n<span style=\"text-indent: 18.6667px\"><span style=\"font-size: 14pt\">You can see this in Figure 6.5 above: as the degrees of freedom increase, the shape of the distribution becomes more and more normal, so much so that the <em>t<\/em>-distribution at <\/span><em style=\"font-size: 14pt\">df<\/em><span style=\"font-size: 14pt\">=30 is already rendered invisible in the figure, its light blue colour <\/span><span style=\"font-size: 18.6667px\">overridden by the normal distribution's black.<\/span><span style=\"font-size: 14pt\">\u00a0<\/span><span style=\"font-size: 14pt\">And from\u00a0<strong><em>N<\/em>=100 on, the <em>t<\/em>\u00a0converges so fast t<span style=\"text-indent: 18.6667px;font-size: 14pt\">o <em>z<\/em><\/span><\/strong><span style=\"text-indent: 1em;font-size: 14pt\"><strong>, the <em>t<\/em>-distribution curve becomes<\/strong> our old, familiar, beloved <strong>normal curve!<\/strong> (Okay, maybe \"beloved\" applies just to me.)<\/span><\/span><\/span>\r\n\r\n&nbsp;\r\n\r\n<span style=\"text-indent: 1em;font-size: 14pt\">Given that in the confidence interval examples in the few preceding sections I used only large <em>N<\/em>'s (=900 and above), the probabilities associated with the <em>t<\/em>-value at <em>N<\/em>-1 degrees of freedom (=899 and above) were the same as those associated with the <em>z<\/em>-values: 68% for <em>t=z<\/em>=1, 95% for <em>t=z<\/em>=1.96, 99% for <em>t=z<\/em>=2.58. (Hence I left them out of the discussion at that time to properly explain here.)<\/span>\r\n\r\n&nbsp;\r\n\r\n<em>Hmm, much ado about nothing<\/em>, I can imagine you saying at this point. If the <em>t<\/em>-distribution and the <em>z<\/em>-distribution are no different at larger <em>N,<\/em> why even bother with the <em>t\u00a0<\/em>(beyond any small-<em>N<\/em> uses)? And as unsatisfying the answer \"I'll explain later\" is, I'm afraid I have no choice but to resort to it, again. Briefly, it has to do with something called a <em>t<\/em>-<em>test for significance<\/em> which we will be using soon enough for hypothesis testing in Chapter 7, next.\r\n\r\n&nbsp;\r\n\r\nFor now, what you should take away from this section is that <strong>the <em>t<\/em>-distribution exists, and it is what we actually use for estimation (and not<em> z<\/em>!), given a specific sample size. <\/strong>As well, remember that<strong> for <em>N<\/em>=100 and above, <em>t<\/em> converges to <em>z<\/em>\u00a0so you can readily apply any probabilities you associate with <em>z<\/em> to <em>t<\/em> with<em> N<\/em>-1 <em>df<\/em>.\u00a0<\/strong>(Regarding the latter, <strong>do not forget to always specify the degrees of freedom for whatever <em>t<\/em> you might have. A <em>t<\/em>-value <em>always<\/em> comes with <em>df<\/em> attached as it's meaningless\/undefined without them.<\/strong>)","rendered":"<p>If, having reached this chapter&#8217;s final section, after all we had been through, random sampling, sampling distribution, CLT, parameters, estimates, statistics, confidence intervals, you are now groaning in dismay &#8212; <em>why is there even more<\/em><span style=\"text-indent: 1em;font-size: 14pt\">\u00a0<em>to this topic??<\/em><\/span><span style=\"font-size: 14pt\"><a class=\"footnote\" title=\"As a general principle, in introductory texts such as this there is\u00a0always\u00a0more. Much, much more; it's not a matter\u00a0if\u00a0but of\u00a0how much\u00a0something is left out.\" id=\"return-footnote-103-1\" href=\"#footnote-103-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a><\/span><span style=\"text-indent: 1em;font-size: 14pt\">\u00a0&#8212; take heart, this is a short explanation I kept for last, through a brief introduction of new concept.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>If you recall, when we needed to calculate the standard error of the mean (or proportion) in the previous few sections, I simply replaced the <em>unknown<\/em> population standard deviation <em>\u03c3<\/em> with the <em>known<\/em> sample standard deviation <em>s<\/em> in the formula. This is what I did:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-de5382c00a55332dd89774492d104d0c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"19\" style=\"vertical-align: -3px;\" \/> <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-873ed97a83ca231463d997dfc01c908f_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#78;&#125;&#125;&#61;&#92;&#116;&#101;&#120;&#116;&#114;&#109;&#123;&#115;&#116;&#97;&#110;&#100;&#97;&#114;&#100;&#32;&#101;&#114;&#114;&#111;&#114;&#32;&#111;&#102;&#32;&#116;&#104;&#101;&#32;&#109;&#101;&#97;&#110;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"24\" width=\"280\" style=\"vertical-align: -11px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Substituting in <em>s\u00a0<\/em>for <em>\u03c3<\/em> we had:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-785ae6e249be46641e4a402bd65210a5_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"19\" style=\"vertical-align: -3px;\" \/> <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-1f908a1fb1b7a5465f07a0ace257e8ab_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#61;&#115;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"36\" style=\"vertical-align: -3px;\" \/> <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-f0a879790dfa4e8fc5eed8181ecb5ebd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#115;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#78;&#125;&#125;&#61;&#92;&#116;&#101;&#120;&#116;&#114;&#109;&#123;&#101;&#115;&#116;&#105;&#109;&#97;&#116;&#101;&#100;&#32;&#115;&#116;&#97;&#110;&#100;&#97;&#114;&#100;&#32;&#101;&#114;&#114;&#111;&#114;&#32;&#111;&#102;&#32;&#116;&#104;&#101;&#32;&#109;&#101;&#97;&#110;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"24\" width=\"361\" style=\"vertical-align: -11px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Similarly, for the proportion we had<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-a12ac52904ec32c5deabf1189fe13553_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#112;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#78;&#125;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#112;&#105;&#40;&#49;&#45;&#92;&#112;&#105;&#41;&#125;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#78;&#125;&#125;&#61;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#105;&#40;&#49;&#45;&#92;&#112;&#105;&#41;&#125;&#123;&#78;&#125;&#125;&#61;&#92;&#116;&#101;&#120;&#116;&#114;&#109;&#123;&#115;&#116;&#97;&#110;&#100;&#97;&#114;&#100;&#32;&#101;&#114;&#114;&#111;&#114;&#32;&#111;&#102;&#32;&#116;&#104;&#101;&#32;&#112;&#114;&#111;&#112;&#111;&#114;&#116;&#105;&#111;&#110;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"35\" width=\"522\" style=\"vertical-align: -11px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>and substituting\u00a0<span style=\"font-size: 14pt;text-indent: 18.6667px\">the known sample proportion\u00a0<\/span><em style=\"font-size: 14pt;text-indent: 18.6667px\">p <\/em>for\u00a0<span style=\"text-indent: 1em;font-size: 14pt\">the unknown population proportion\u00a0<\/span><em style=\"text-indent: 1em;font-size: 14pt\">\u03c0 <\/em>in calculating the proportion&#8217;s variability<span style=\"text-indent: 1em;font-size: 14pt\">, we ended up with:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-809f26814b436601c1a690d46a976e8e_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#112;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#78;&#125;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#112;&#40;&#49;&#45;&#112;&#41;&#125;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#78;&#125;&#125;&#61;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#112;&#40;&#49;&#45;&#112;&#41;&#125;&#123;&#78;&#125;&#125;&#61;&#92;&#116;&#101;&#120;&#116;&#114;&#109;&#123;&#101;&#115;&#116;&#105;&#109;&#97;&#116;&#101;&#100;&#32;&#115;&#116;&#97;&#110;&#100;&#97;&#114;&#100;&#32;&#101;&#114;&#114;&#111;&#114;&#32;&#111;&#102;&#32;&#116;&#104;&#101;&#32;&#112;&#114;&#111;&#112;&#111;&#114;&#116;&#105;&#111;&#110;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"51\" width=\"582\" style=\"vertical-align: -3px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>But why can we do that?<\/p>\n<p>&nbsp;<\/p>\n<p>The more observant of you might have noticed that I swept the explanation for this change under the carpet and simply moved on &#8212; but why should the variability of the population be the same as the sample?<\/p>\n<p>&nbsp;<\/p>\n<p>In truth, they are not &#8212; or rather, they <em>might<\/em> be; there&#8217;s just no way to know. That is, by using the sample statistics to estimate the variability of the population, we introduce more\u00a0<em>uncertainty\u00a0<\/em>in the calculation. When we do that, we actually move away from using the normal distribution and its associated <em>z<\/em>-values. What we end up using is something similar, called the <em>t-distribution<\/em><a class=\"footnote\" title=\"Also called the Student's t-distribution, after the pseudonym of William Gosset who introduced it to statistics (along with many other concepts). Due to contractual obligations, William Gosset used to publish under the name of &quot;Student&quot; (Pagels, 2018). Here you can find more about his curious case: https:\/\/medium.com\/value-stream-design\/the-curious-tale-of-william-sealy-gosset-b3178a9f6ac8.\" id=\"return-footnote-103-2\" href=\"#footnote-103-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a>: an entire set of bell-shaped curves, accounting for each and every sample size <em>N<\/em>. Figure 6.5 illustrates.<\/p>\n<p>&nbsp;<\/p>\n<p><em>Figure 6.5\u00a0The Normal vs. the t-Distribution<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/03\/normal-vs-t-1.jpg\" alt=\"\" width=\"673\" height=\"274\" class=\"wp-image-936 aligncenter\" srcset=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/03\/normal-vs-t-1.jpg 871w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/03\/normal-vs-t-1-300x122.jpg 300w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/03\/normal-vs-t-1-768x312.jpg 768w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/03\/normal-vs-t-1-65x26.jpg 65w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/03\/normal-vs-t-1-225x91.jpg 225w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/03\/normal-vs-t-1-350x142.jpg 350w\" sizes=\"auto, (max-width: 673px) 100vw, 673px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-indent: 18.6667px\"><strong>The <em>t<\/em>-distribution provides a separate bell-shaped curve for each possible sample size<\/strong>, thus helping us &#8220;ground&#8221;, as it were, the estimation in the reality of an actual sample of a specific size.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>The accommodation of the sample size is done through the concept of <em>degrees of freedom <\/em>(commonly abbreviated to <em>df<\/em>). The degrees of freedom represent the number of values in a statistical calculation that are free to vary. In the case of the <em>t<\/em>-distribution, the degrees of freedom are <em>N<\/em>-1 as one degree of freedom is reserved for estimating the mean, and <em>N<\/em>-1 degrees remain for estimating the variability.\u00a0<\/strong>Unlike with <em>z<\/em>-values, where each <em>z<\/em>-value represents a specific probability under the normal curve, the probabilities associates by <em>t<\/em>-values are calculated based on its degrees of freedom.<\/p>\n<p>&nbsp;<\/p>\n<p>Still, none of this explains why I was able to shamelessly switch from using the <em>z<\/em>-distribution to the <em>t<\/em>-distribution, without any change to the standard error and confidence interval calculations in the examples in the previous sections. If <em>z<\/em>-values and <em>t<\/em>-values (and their associated probabilities) are different, shouldn&#8217;t the calculations differ too?<\/p>\n<p>&nbsp;<\/p>\n<p>Before I reassure you that all is well (and it is), let&#8217;s revisit what<em> z<\/em>-values actually represent. From Chapter 5 you know that the <em>z<\/em>-value is the distance between a case and the mean, expressed in terms standard deviations (i.e., standardized):<\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 35px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-7bb9bffb532f78ef46d182a25a8141d6_l3.png\" height=\"35\" width=\"83\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#122;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#120;&#95;&#105;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#125;&#123;&#115;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The reason we were able to use <em>z<\/em>=1,<em> z<\/em>=1.96, and <em>z<\/em>=2.58 in the calculations of the 68%, 95%, and 99% confidence intervals, respectively, was because the sampling distribution is a normal distribution (<span style=\"text-indent: 18.6667px;font-size: 14pt\">per the Central Limit Theorem)<\/span><span style=\"text-indent: 1em;font-size: 14pt\">. That is, the <em>z<\/em>-value in this case is the distance between the sample mean (the &#8220;case&#8221; in the sampling distribution) and the population mean (&#8220;the mean of means&#8221;, the mean of the sampling distribution), expressed in standard errors (the &#8220;standard deviation&#8221; of the sampling distribution):<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 37px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-a301ef5bee38821a2d001f3210733249_l3.png\" height=\"37\" width=\"78\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#122;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#45;&#92;&#109;&#117;&#125;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p> <a class=\"footnote\" title=\"where  .\" id=\"return-footnote-103-3\" href=\"#footnote-103-3\" aria-label=\"Footnote 3\"><sup class=\"footnote\">[3]<\/sup><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Now what about <em>t<\/em>? By substituting the sample standard deviation for the population standard deviation, we end up with the <em>estimated<\/em> standard error. In turn, substituting the <em>estimated<\/em> standard error for the standard error in the formula for the <em>z<\/em>-value above, we get\u00a0the <em>t<\/em>-value, the distance between the sample mean and the population mean, expressed in <em>estimated<\/em> standard errors:<\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 37px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-25e3f98e722f4d2c1a4172301778750d_l3.png\" height=\"37\" width=\"75\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#116;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#45;&#92;&#109;&#117;&#125;&#123;&#115;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p> <a class=\"footnote\" title=\"Where  .\" id=\"return-footnote-103-4\" href=\"#footnote-103-4\" aria-label=\"Footnote 4\"><sup class=\"footnote\">[4]<\/sup><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Compare the two formulas for the\u00a0<em>z<\/em>-value and the <em>t<\/em>-value above. As similar as they look, the <em>t<\/em>-value is more &#8220;uncertain&#8221; than the <em>z<\/em>-value, and comes with the aforementioned specification of degrees of freedom. Given specific degrees of freedom, the shape of the <em>t<\/em>-distribution curve changes, and thus the probabilities associated with each <em>t<\/em>-value change too.<\/p>\n<p>&nbsp;<\/p>\n<p>Finally, for the drum roll: The reason I was able to work with <em>t<\/em>-values instead of <em>z<\/em>-values in the calculations of confidence intervals in the previous section without acknowledging it is due to the sample sizes I chose for my examples. See, <strong>the biggest difference between the <em>z<\/em> and the <em>t<\/em> happens with small <em>N\u00a0<\/em>(especially <em>N<\/em>&lt;30).\u00a0The larger the <em>N<\/em>, the closer and closer the <em>t<\/em>-distribution approaches the <em>z<\/em>-distribution.\u00a0<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"text-indent: 18.6667px\"><span style=\"font-size: 14pt\">You can see this in Figure 6.5 above: as the degrees of freedom increase, the shape of the distribution becomes more and more normal, so much so that the <em>t<\/em>-distribution at <\/span><em style=\"font-size: 14pt\">df<\/em><span style=\"font-size: 14pt\">=30 is already rendered invisible in the figure, its light blue colour <\/span><span style=\"font-size: 18.6667px\">overridden by the normal distribution&#8217;s black.<\/span><span style=\"font-size: 14pt\">\u00a0<\/span><span style=\"font-size: 14pt\">And from\u00a0<strong><em>N<\/em>=100 on, the <em>t<\/em>\u00a0converges so fast t<span style=\"text-indent: 18.6667px;font-size: 14pt\">o <em>z<\/em><\/span><\/strong><span style=\"text-indent: 1em;font-size: 14pt\"><strong>, the <em>t<\/em>-distribution curve becomes<\/strong> our old, familiar, beloved <strong>normal curve!<\/strong> (Okay, maybe &#8220;beloved&#8221; applies just to me.)<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"text-indent: 1em;font-size: 14pt\">Given that in the confidence interval examples in the few preceding sections I used only large <em>N<\/em>&#8216;s (=900 and above), the probabilities associated with the <em>t<\/em>-value at <em>N<\/em>-1 degrees of freedom (=899 and above) were the same as those associated with the <em>z<\/em>-values: 68% for <em>t=z<\/em>=1, 95% for <em>t=z<\/em>=1.96, 99% for <em>t=z<\/em>=2.58. (Hence I left them out of the discussion at that time to properly explain here.)<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><em>Hmm, much ado about nothing<\/em>, I can imagine you saying at this point. If the <em>t<\/em>-distribution and the <em>z<\/em>-distribution are no different at larger <em>N,<\/em> why even bother with the <em>t\u00a0<\/em>(beyond any small-<em>N<\/em> uses)? And as unsatisfying the answer &#8220;I&#8217;ll explain later&#8221; is, I&#8217;m afraid I have no choice but to resort to it, again. Briefly, it has to do with something called a <em>t<\/em>&#8211;<em>test for significance<\/em> which we will be using soon enough for hypothesis testing in Chapter 7, next.<\/p>\n<p>&nbsp;<\/p>\n<p>For now, what you should take away from this section is that <strong>the <em>t<\/em>-distribution exists, and it is what we actually use for estimation (and not<em> z<\/em>!), given a specific sample size. <\/strong>As well, remember that<strong> for <em>N<\/em>=100 and above, <em>t<\/em> converges to <em>z<\/em>\u00a0so you can readily apply any probabilities you associate with <em>z<\/em> to <em>t<\/em> with<em> N<\/em>-1 <em>df<\/em>.\u00a0<\/strong>(Regarding the latter, <strong>do not forget to always specify the degrees of freedom for whatever <em>t<\/em> you might have. A <em>t<\/em>-value <em>always<\/em> comes with <em>df<\/em> attached as it&#8217;s meaningless\/undefined without them.<\/strong>)<\/p>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-103-1\">As a general principle, in introductory texts such as this there is\u00a0<\/span><em style=\"font-size: 14pt\">always<\/em><span style=\"font-size: 14pt\">\u00a0more. Much, much more; it's not a matter\u00a0<\/span><em style=\"font-size: 14pt\">if<\/em><span style=\"font-size: 14pt\">\u00a0<\/span><span style=\"font-size: 14pt\">but of\u00a0<\/span><em style=\"font-size: 14pt\">how much\u00a0<\/em><span style=\"font-size: 14pt\">something is left out<\/span><span style=\"font-size: 14pt\">.  <a href=\"#return-footnote-103-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-103-2\">Also called the <em>Student<\/em>'s <em>t<\/em>-distribution, after the pseudonym of William Gosset who introduced it to statistics (along with many other concepts). Due to contractual obligations, William Gosset used to publish under the name of \"Student\" (Pagels, 2018). Here you can find more about his curious case: <a href=\"https:\/\/medium.com\/value-stream-design\/the-curious-tale-of-william-sealy-gosset-b3178a9f6ac8.\">https:\/\/medium.com\/value-stream-design\/the-curious-tale-of-william-sealy-gosset-b3178a9f6ac8.<\/a> <a href=\"#return-footnote-103-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><li id=\"footnote-103-3\">where <img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-de5382c00a55332dd89774492d104d0c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"19\" style=\"vertical-align: -3px;\" \/> <img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-ff1cbb6cd1d0399c05c18824c0141efd_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#78;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"24\" width=\"45\" style=\"vertical-align: -11px;\" \/>. <a href=\"#return-footnote-103-3\" class=\"return-footnote\" aria-label=\"Return to footnote 3\">&crarr;<\/a><\/li><li id=\"footnote-103-4\">Where <img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-5ee7b9a8ecba4b54a50c56e76a5e2ff1_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#115;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"17\" style=\"vertical-align: -3px;\" \/> <img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-0fe4258536c8a7f1895bd559d297d2c2_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#115;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#78;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"24\" width=\"45\" style=\"vertical-align: -11px;\" \/>. <a href=\"#return-footnote-103-4\" class=\"return-footnote\" aria-label=\"Return to footnote 4\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":533,"menu_order":10,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-103","chapter","type-chapter","status-publish","hentry"],"part":32,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/users\/533"}],"version-history":[{"count":25,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/103\/revisions"}],"predecessor-version":[{"id":2056,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/103\/revisions\/2056"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/parts\/32"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/103\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/media?parent=103"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapter-type?post=103"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/contributor?post=103"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/license?post=103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}