{"id":701,"date":"2019-03-01T18:24:34","date_gmt":"2019-03-01T23:24:34","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/simplestats\/?post_type=chapter&#038;p=701"},"modified":"2019-10-18T18:17:25","modified_gmt":"2019-10-18T22:17:25","slug":"6-4-parameters-statistics-and-estimators","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/6-4-parameters-statistics-and-estimators\/","title":{"raw":"6.4 Parameters, Statistics, and Estimators","rendered":"6.4 Parameters, Statistics, and Estimators"},"content":{"raw":"[latexpage]\r\n\r\nThe logic underlying statistical inference is that we want to know something about a population of interest but, since we cannot know it directly, what we do is study a subgroup of that population. Based on what we learn\/know about the subgroup, we can then <em>estimate<\/em> (i.e., infer) things about the population. In the previous section, we already established that not any subgroup of the population will do -- what we need is a <em>randomly<\/em> selected sample, created through one of the random sampling methods I listed (simple, systematic, stratified, and cluster). What we do is <strong>collect data from\/about elements of a <em>sample<\/em> (e.g., respondents) with the explicit goal of finding something and drawing conclusions about a <em>population<\/em><\/strong>. (Again, we can do that due to the fact that random sampling allows us to use probability theory through the normal curve.)\r\n\r\n&nbsp;\r\n\r\nSaying we want to find \"something\" about the population of interest is hardly formal (much less precise) terminology but I wanted to get the message across before I introduced you to the proper statistics jargon. Let's do that now.\r\n\r\n&nbsp;\r\n\r\nPopulations have <em>parameters<\/em> and samples have <em>statistics<\/em>. <strong>We describe populations with their <em>parameters<\/em> while we describe samples with their <em>statistics<\/em>.<\/strong> When we study something, we are interested in the parameters of the population, however, in most cases it is difficult to collect the information to calculate them. What we do instead is <strong>we take a random sample of the population and calculate the <em>sample's statistics<\/em>. We then use the sample statistics to <em>estimate <\/em>(i.e., infer) the <em>population parameters<\/em>.<\/strong> Thus, sample statistics are also called <em>estimators<\/em> of population parameters.\r\n\r\n&nbsp;\r\n\r\nFor example, if we want to know the average age of Canadians, we could either do a census and ask everyone or simply take a nationally representative sample. Considering how expensive and time-consuming it would be to ask all 36.7 mln. Canadians (and Statistics Canada conducts the official census only every five years), we can poll a random selection of people across Canada, calculate their average age, and use <em>that<\/em> as an <em>estimate<\/em> of the average age of all Canadians[footnote]When people who have no statistics background learn of this, they usually protest that the information is not accurate because it's not based on <em>everyone<\/em>. What you will learn in this chapter is that you don't <em>need<\/em> everyone, and a sample is perfectly enough because random samples of sufficient size are mathematically proven to produce the best (closest, truest, most unbiased) estimates of the population parameters. To the extent that there is a difference between a statistic and the parameter it estimates, this difference is accounted for by reporting levels of certainty\/confidence. More on that later.[\/footnote].\r\n\r\n&nbsp;\r\n\r\nIn this example, the average age calculated based on the people in the sample is the<em> statistic<\/em> which we use to <em>estimate<\/em> the average age of all Canadians, the population <em>parameter<\/em>. All measures of central tendency and dispersion describing variables based on sample data are statistics. On the other hand, if we have data from all the population when calculating measures of central tendency and dispersion, we would have parameters.\r\n\r\n&nbsp;\r\n\r\nConsider if you will, examples I have used in past chapters: whenever the example was based on actual data from a dataset, and SPSS was used, this was sample data producing statistics[footnote]All datasets used in this book are nationally representative data collected by Statistics Canada.[\/footnote]. Even if we haven't used statistics in this way yet, t<span style=\"text-indent: 18.6667px;font-size: 14pt\">hey <em>can<\/em> be used to estimate things about Canadians as a whole. On the other hand, any time I have used examples using hypothetical (imaginary) data about \"your friends\", \"your classmates\", \"hours you have worked per week\", etc. can be considered as having population data, as we imagine we have all the information about those things, and there's nothing to estimate.\u00a0\u00a0<\/span>\r\n\r\n&nbsp;\r\n\r\nA final note concerns formal notation. <strong>To differentiate between statistics and parameters, we designate sample statistics by <em>Latin<\/em> letters but we denote population parameters by <em>Greek<\/em> letters.<\/strong>\r\n\r\n&nbsp;\r\n\r\nYou have already seen a ready-made example for this rule: recall our discussion on variance and standard deviation. In Section 4.4 (<a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/4-4-standard-deviation\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/4-4-standard-deviation\/<\/a>) I introduced formulas for\u00a0<em>\u03c3<\/em> and\u00a0<em>\u03c3<sup>2<\/sup>\u00a0<\/em>and I mentioned (without much explanation) that another \"version\" of these exist as\u00a0<em>s<\/em> and <em>s<sup>2<\/sup><\/em>. In truth, when we calculated the variance and the standard deviation with the hypothetical data in the examples, we needed the <em>population<\/em> standard deviation and variance (i.e., <em>\u03c3<\/em> and\u00a0<em>\u03c3<sup>2<\/sup><\/em>, respectively); but when we use SPSS with a dataset (i.e., sample data), we need the <em>sample<\/em> standard deviation and variance (i.e.,\u00a0<em>s<\/em> and <em>s<sup>2<\/sup><\/em>, respectively). Here they are again:\r\n\r\n&nbsp;\r\n\r\n$$\\frac{\\sum\\limits_{i=1}^{N}{(x_i-\\overline{x})^2}}{N} = \\sigma^2 =\\textrm{population variance}$$\r\n\r\n&nbsp;\r\n\r\n$$\\sqrt{\\frac{\\sum\\limits_{i=1}^{N}{(x_i-\\overline{x})^2}}{N}} = \\sqrt{\\sigma^2}=\\sigma=\\textrm{population standard deviation}$$\r\n\r\n&nbsp;\r\n\r\n$$\\frac{\\sum\\limits_{i=1}^{N}{(x_i-\\overline{x})^2}}{N-1} = s^2\u00a0=\\textrm{sample variance}$$\r\n\r\n&nbsp;\r\n\r\n<span style=\"text-indent: 18.6667px\">$$\\sqrt{\\frac{\\sum\\limits_{i=1}^{N}{(x_i-\\overline{x})^2}}{N-1}} = \\sqrt{s^2}=s=\\textrm{sample standard deviation}$$\u00a0<\/span>\r\n\r\n&nbsp;\r\n\r\nI'll take this opportunity to finally explain why we need the difference in the formulas (i.e., to divide by <em>N-1<\/em> in the <em>sample<\/em> formulas but by <em>N<\/em> in the <em>population<\/em> formulas). Considering that the sample statistics <em>estimate<\/em> the population parameters but are arguably different from the exact parameters -- i.e., some uncertainty exists, as inference is not a perfect \"guess\" -- to assume what we obtain from a sample is exactly the same as the population would be a biased estimation. Thus, the <em>N-1<\/em> is meant to correct that bias[footnote]This is called <em>Bessel's correction<\/em>, by the name of Friedrich Bessel who introduced it.[\/footnote] (which it does for the variance, and does to an extent for the standard deviation). <strong>What we have then is that <em>s<\/em> and <em>s<sup>2<\/sup><\/em> are unbiased estimators of\u00a0<em>\u03c3<\/em> and\u00a0<em>\u03c3<sup>2<\/sup><\/em>, respectively.<\/strong>\r\n\r\n&nbsp;\r\n\r\nThus it should be clear why we use the <em>s<\/em> and <em>s<sup>2<\/sup><\/em>\u00a0 formulas when working with datasets and SPSS -- as the actual data has been collected from respondents randomly selected from\u00a0<span style=\"text-indent: 18.6667px;font-size: 14pt\">a population of interest and comprising\u00a0<\/span><span style=\"text-indent: 1em;font-size: 14pt\">a sample of specific size. On the other hand, when we have data about everyone\/everything we're interested in (like in the small-scale examples with made-up data), we have a <\/span><em style=\"text-indent: 1em;font-size: 14pt\">de facto<\/em><span style=\"text-indent: 1em;font-size: 14pt\"> population on our hands -- hence the\u00a0<\/span><em style=\"text-indent: 1em;font-size: 14pt\">\u03c3<\/em><span style=\"text-indent: 1em;font-size: 14pt\"> and\u00a0<\/span><em style=\"text-indent: 1em;font-size: 14pt\">\u03c3<sup>2<\/sup> <\/em><span style=\"text-indent: 1em;font-size: 14pt\">formulas are appropriate. In the former case, the findings can be extrapolated to the population (acknowledging that we are dealing with inferred estimates); in the latter case, there is nothing further to extrapolate as we are calculating the parameters directly.<\/span>\r\n\r\n&nbsp;\r\n\r\nAnother important parameter to note (as we will be using it a lot from now) on is the population mean designated by the small-case Greek letter for <em>m<\/em>\u00a0(from <em>mean<\/em>) -- <em>\u03bc<\/em>, which I introduced in Section 5.1.2 (<a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/5-1-2-the-z-value\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/5-1-2-the-z-value\/<\/a>) without giving you a reason why. Unlike the correspondence between <em>s<\/em> and\u00a0<em>\u03c3<\/em>, however, we don't usually denote the sample mean with an <em>m<\/em>; as you know, we use $\\overline{x}$ instead (so that we know which variable's mean we have in mind).\r\n\r\n&nbsp;\r\n\r\nFinally, when a parameter is being estimated by an estimator, it is designated by a \"hat\" on top: for example, if we have a sample statistic called <em>a<\/em> estimating a population parameter\u00a0<em>\u03b1<\/em>[footnote]This is the small-case Greek letter <em>a<\/em>: <em>\u03b1<\/em>, pronounced \"AL-pha\".[\/footnote], the estimated\u00a0<em>\u03b1<\/em> will be $\\hat{\\alpha}$, pronounced \"alpha-hat\". By analogy, if a statistic <em>b<\/em> estimates a parameter\u00a0<em>\u03b2<\/em>[footnote]This is the small-case Greek letter <em>b<\/em>: <em>\u03b2<\/em>, pronounced \"BAY-ta\".[\/footnote], the estimated\u00a0<em>\u03b2<\/em> will be $\\hat{\\beta}$, pronounced \"beta-hat\".\r\n\r\n&nbsp;\r\n\r\nThus, the logic of inference tells us that while <em>a<\/em> = $\\hat{\\alpha}$ and <em>b<\/em> = $\\hat{\\beta}$ (i.e., the statistics are estimators for the parameters),\u00a0<em>a<\/em> = $\\hat{\\alpha}\\neq\\alpha$ and <em>b<\/em> = $\\hat{\\beta}\\neq\\beta$. <strong>That is, the statistics <\/strong>(a.k.a. estimators)<strong> are not the same as the parameters.<\/strong> More on this, next.","rendered":"<p>The logic underlying statistical inference is that we want to know something about a population of interest but, since we cannot know it directly, what we do is study a subgroup of that population. Based on what we learn\/know about the subgroup, we can then <em>estimate<\/em> (i.e., infer) things about the population. In the previous section, we already established that not any subgroup of the population will do &#8212; what we need is a <em>randomly<\/em> selected sample, created through one of the random sampling methods I listed (simple, systematic, stratified, and cluster). What we do is <strong>collect data from\/about elements of a <em>sample<\/em> (e.g., respondents) with the explicit goal of finding something and drawing conclusions about a <em>population<\/em><\/strong>. (Again, we can do that due to the fact that random sampling allows us to use probability theory through the normal curve.)<\/p>\n<p>&nbsp;<\/p>\n<p>Saying we want to find &#8220;something&#8221; about the population of interest is hardly formal (much less precise) terminology but I wanted to get the message across before I introduced you to the proper statistics jargon. Let&#8217;s do that now.<\/p>\n<p>&nbsp;<\/p>\n<p>Populations have <em>parameters<\/em> and samples have <em>statistics<\/em>. <strong>We describe populations with their <em>parameters<\/em> while we describe samples with their <em>statistics<\/em>.<\/strong> When we study something, we are interested in the parameters of the population, however, in most cases it is difficult to collect the information to calculate them. What we do instead is <strong>we take a random sample of the population and calculate the <em>sample&#8217;s statistics<\/em>. We then use the sample statistics to <em>estimate <\/em>(i.e., infer) the <em>population parameters<\/em>.<\/strong> Thus, sample statistics are also called <em>estimators<\/em> of population parameters.<\/p>\n<p>&nbsp;<\/p>\n<p>For example, if we want to know the average age of Canadians, we could either do a census and ask everyone or simply take a nationally representative sample. Considering how expensive and time-consuming it would be to ask all 36.7 mln. Canadians (and Statistics Canada conducts the official census only every five years), we can poll a random selection of people across Canada, calculate their average age, and use <em>that<\/em> as an <em>estimate<\/em> of the average age of all Canadians<a class=\"footnote\" title=\"When people who have no statistics background learn of this, they usually protest that the information is not accurate because it's not based on everyone. What you will learn in this chapter is that you don't need everyone, and a sample is perfectly enough because random samples of sufficient size are mathematically proven to produce the best (closest, truest, most unbiased) estimates of the population parameters. To the extent that there is a difference between a statistic and the parameter it estimates, this difference is accounted for by reporting levels of certainty\/confidence. More on that later.\" id=\"return-footnote-701-1\" href=\"#footnote-701-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p>In this example, the average age calculated based on the people in the sample is the<em> statistic<\/em> which we use to <em>estimate<\/em> the average age of all Canadians, the population <em>parameter<\/em>. All measures of central tendency and dispersion describing variables based on sample data are statistics. On the other hand, if we have data from all the population when calculating measures of central tendency and dispersion, we would have parameters.<\/p>\n<p>&nbsp;<\/p>\n<p>Consider if you will, examples I have used in past chapters: whenever the example was based on actual data from a dataset, and SPSS was used, this was sample data producing statistics<a class=\"footnote\" title=\"All datasets used in this book are nationally representative data collected by Statistics Canada.\" id=\"return-footnote-701-2\" href=\"#footnote-701-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a>. Even if we haven&#8217;t used statistics in this way yet, t<span style=\"text-indent: 18.6667px;font-size: 14pt\">hey <em>can<\/em> be used to estimate things about Canadians as a whole. On the other hand, any time I have used examples using hypothetical (imaginary) data about &#8220;your friends&#8221;, &#8220;your classmates&#8221;, &#8220;hours you have worked per week&#8221;, etc. can be considered as having population data, as we imagine we have all the information about those things, and there&#8217;s nothing to estimate.\u00a0\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>A final note concerns formal notation. <strong>To differentiate between statistics and parameters, we designate sample statistics by <em>Latin<\/em> letters but we denote population parameters by <em>Greek<\/em> letters.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>You have already seen a ready-made example for this rule: recall our discussion on variance and standard deviation. In Section 4.4 (<a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/4-4-standard-deviation\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/4-4-standard-deviation\/<\/a>) I introduced formulas for\u00a0<em>\u03c3<\/em> and\u00a0<em>\u03c3<sup>2<\/sup>\u00a0<\/em>and I mentioned (without much explanation) that another &#8220;version&#8221; of these exist as\u00a0<em>s<\/em> and <em>s<sup>2<\/sup><\/em>. In truth, when we calculated the variance and the standard deviation with the hypothetical data in the examples, we needed the <em>population<\/em> standard deviation and variance (i.e., <em>\u03c3<\/em> and\u00a0<em>\u03c3<sup>2<\/sup><\/em>, respectively); but when we use SPSS with a dataset (i.e., sample data), we need the <em>sample<\/em> standard deviation and variance (i.e.,\u00a0<em>s<\/em> and <em>s<sup>2<\/sup><\/em>, respectively). Here they are again:<\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 63px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-47c2d5edb5e66453ec165f3ae82ef045_l3.png\" height=\"63\" width=\"315\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#117;&#109;&#92;&#108;&#105;&#109;&#105;&#116;&#115;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#123;&#78;&#125;&#123;&#40;&#120;&#95;&#105;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#41;&#94;&#50;&#125;&#125;&#123;&#78;&#125;&#32;&#61;&#32;&#92;&#115;&#105;&#103;&#109;&#97;&#94;&#50;&#32;&#61;&#92;&#116;&#101;&#120;&#116;&#114;&#109;&#123;&#112;&#111;&#112;&#117;&#108;&#97;&#116;&#105;&#111;&#110;&#32;&#118;&#97;&#114;&#105;&#97;&#110;&#99;&#101;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 74px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-98fd16c772c4b6fe318013a68c9fe524_l3.png\" height=\"74\" width=\"466\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#117;&#109;&#92;&#108;&#105;&#109;&#105;&#116;&#115;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#123;&#78;&#125;&#123;&#40;&#120;&#95;&#105;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#41;&#94;&#50;&#125;&#125;&#123;&#78;&#125;&#125;&#32;&#61;&#32;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#94;&#50;&#125;&#61;&#92;&#115;&#105;&#103;&#109;&#97;&#61;&#92;&#116;&#101;&#120;&#116;&#114;&#109;&#123;&#112;&#111;&#112;&#117;&#108;&#97;&#116;&#105;&#111;&#110;&#32;&#115;&#116;&#97;&#110;&#100;&#97;&#114;&#100;&#32;&#100;&#101;&#118;&#105;&#97;&#116;&#105;&#111;&#110;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 64px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-05323aa8c2c2107f7ae8f826ec25dd34_l3.png\" height=\"64\" width=\"283\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#117;&#109;&#92;&#108;&#105;&#109;&#105;&#116;&#115;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#123;&#78;&#125;&#123;&#40;&#120;&#95;&#105;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#41;&#94;&#50;&#125;&#125;&#123;&#78;&#45;&#49;&#125;&#32;&#61;&#32;&#115;&#94;&#50;&#32;&#61;&#92;&#116;&#101;&#120;&#116;&#114;&#109;&#123;&#115;&#97;&#109;&#112;&#108;&#101;&#32;&#118;&#97;&#114;&#105;&#97;&#110;&#99;&#101;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"text-indent: 18.6667px\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-bf5c997070d9c7dda04f967d06d2b613_l3.png\" height=\"75\" width=\"431\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#117;&#109;&#92;&#108;&#105;&#109;&#105;&#116;&#115;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#123;&#78;&#125;&#123;&#40;&#120;&#95;&#105;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#41;&#94;&#50;&#125;&#125;&#123;&#78;&#45;&#49;&#125;&#125;&#32;&#61;&#32;&#92;&#115;&#113;&#114;&#116;&#123;&#115;&#94;&#50;&#125;&#61;&#115;&#61;&#92;&#116;&#101;&#120;&#116;&#114;&#109;&#123;&#115;&#97;&#109;&#112;&#108;&#101;&#32;&#115;&#116;&#97;&#110;&#100;&#97;&#114;&#100;&#32;&#100;&#101;&#118;&#105;&#97;&#116;&#105;&#111;&#110;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/>\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>I&#8217;ll take this opportunity to finally explain why we need the difference in the formulas (i.e., to divide by <em>N-1<\/em> in the <em>sample<\/em> formulas but by <em>N<\/em> in the <em>population<\/em> formulas). Considering that the sample statistics <em>estimate<\/em> the population parameters but are arguably different from the exact parameters &#8212; i.e., some uncertainty exists, as inference is not a perfect &#8220;guess&#8221; &#8212; to assume what we obtain from a sample is exactly the same as the population would be a biased estimation. Thus, the <em>N-1<\/em> is meant to correct that bias<a class=\"footnote\" title=\"This is called Bessel's correction, by the name of Friedrich Bessel who introduced it.\" id=\"return-footnote-701-3\" href=\"#footnote-701-3\" aria-label=\"Footnote 3\"><sup class=\"footnote\">[3]<\/sup><\/a> (which it does for the variance, and does to an extent for the standard deviation). <strong>What we have then is that <em>s<\/em> and <em>s<sup>2<\/sup><\/em> are unbiased estimators of\u00a0<em>\u03c3<\/em> and\u00a0<em>\u03c3<sup>2<\/sup><\/em>, respectively.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>Thus it should be clear why we use the <em>s<\/em> and <em>s<sup>2<\/sup><\/em>\u00a0 formulas when working with datasets and SPSS &#8212; as the actual data has been collected from respondents randomly selected from\u00a0<span style=\"text-indent: 18.6667px;font-size: 14pt\">a population of interest and comprising\u00a0<\/span><span style=\"text-indent: 1em;font-size: 14pt\">a sample of specific size. On the other hand, when we have data about everyone\/everything we&#8217;re interested in (like in the small-scale examples with made-up data), we have a <\/span><em style=\"text-indent: 1em;font-size: 14pt\">de facto<\/em><span style=\"text-indent: 1em;font-size: 14pt\"> population on our hands &#8212; hence the\u00a0<\/span><em style=\"text-indent: 1em;font-size: 14pt\">\u03c3<\/em><span style=\"text-indent: 1em;font-size: 14pt\"> and\u00a0<\/span><em style=\"text-indent: 1em;font-size: 14pt\">\u03c3<sup>2<\/sup> <\/em><span style=\"text-indent: 1em;font-size: 14pt\">formulas are appropriate. In the former case, the findings can be extrapolated to the population (acknowledging that we are dealing with inferred estimates); in the latter case, there is nothing further to extrapolate as we are calculating the parameters directly.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>Another important parameter to note (as we will be using it a lot from now) on is the population mean designated by the small-case Greek letter for <em>m<\/em>\u00a0(from <em>mean<\/em>) &#8212; <em>\u03bc<\/em>, which I introduced in Section 5.1.2 (<a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/5-1-2-the-z-value\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/5-1-2-the-z-value\/<\/a>) without giving you a reason why. Unlike the correspondence between <em>s<\/em> and\u00a0<em>\u03c3<\/em>, however, we don&#8217;t usually denote the sample mean with an <em>m<\/em>; as you know, we use <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-0d00c2da2b2541a97ae0ac3c10e1504e_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"11\" style=\"vertical-align: 0px;\" \/> instead (so that we know which variable&#8217;s mean we have in mind).<\/p>\n<p>&nbsp;<\/p>\n<p>Finally, when a parameter is being estimated by an estimator, it is designated by a &#8220;hat&#8221; on top: for example, if we have a sample statistic called <em>a<\/em> estimating a population parameter\u00a0<em>\u03b1<\/em><a class=\"footnote\" title=\"This is the small-case Greek letter a: \u03b1, pronounced &quot;AL-pha&quot;.\" id=\"return-footnote-701-4\" href=\"#footnote-701-4\" aria-label=\"Footnote 4\"><sup class=\"footnote\">[4]<\/sup><\/a>, the estimated\u00a0<em>\u03b1<\/em> will be <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-89ecd8603670c36cb03393eea395c246_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#92;&#97;&#108;&#112;&#104;&#97;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"11\" style=\"vertical-align: 0px;\" \/>, pronounced &#8220;alpha-hat&#8221;. By analogy, if a statistic <em>b<\/em> estimates a parameter\u00a0<em>\u03b2<\/em><a class=\"footnote\" title=\"This is the small-case Greek letter b: \u03b2, pronounced &quot;BAY-ta&quot;.\" id=\"return-footnote-701-5\" href=\"#footnote-701-5\" aria-label=\"Footnote 5\"><sup class=\"footnote\">[5]<\/sup><\/a>, the estimated\u00a0<em>\u03b2<\/em> will be <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-ef8f1f3059529504292816288a1a2454_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#92;&#98;&#101;&#116;&#97;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"22\" width=\"11\" style=\"vertical-align: -4px;\" \/>, pronounced &#8220;beta-hat&#8221;.<\/p>\n<p>&nbsp;<\/p>\n<p>Thus, the logic of inference tells us that while <em>a<\/em> = <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-89ecd8603670c36cb03393eea395c246_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#92;&#97;&#108;&#112;&#104;&#97;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"11\" style=\"vertical-align: 0px;\" \/> and <em>b<\/em> = <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-ef8f1f3059529504292816288a1a2454_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#92;&#98;&#101;&#116;&#97;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"22\" width=\"11\" style=\"vertical-align: -4px;\" \/> (i.e., the statistics are estimators for the parameters),\u00a0<em>a<\/em> = <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-6340d89578cd0bd5d94e07a999671110_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#92;&#97;&#108;&#112;&#104;&#97;&#125;&#92;&#110;&#101;&#113;&#92;&#97;&#108;&#112;&#104;&#97;\" title=\"Rendered by QuickLaTeX.com\" height=\"17\" width=\"46\" style=\"vertical-align: -4px;\" \/> and <em>b<\/em> = <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-97a9f3ce84fb2fe8c6a5151c26faf3cf_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#104;&#97;&#116;&#123;&#92;&#98;&#101;&#116;&#97;&#125;&#92;&#110;&#101;&#113;&#92;&#98;&#101;&#116;&#97;\" title=\"Rendered by QuickLaTeX.com\" height=\"22\" width=\"46\" style=\"vertical-align: -4px;\" \/>. <strong>That is, the statistics <\/strong>(a.k.a. estimators)<strong> are not the same as the parameters.<\/strong> More on this, next.<\/p>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-701-1\">When people who have no statistics background learn of this, they usually protest that the information is not accurate because it's not based on <em>everyone<\/em>. What you will learn in this chapter is that you don't <em>need<\/em> everyone, and a sample is perfectly enough because random samples of sufficient size are mathematically proven to produce the best (closest, truest, most unbiased) estimates of the population parameters. To the extent that there is a difference between a statistic and the parameter it estimates, this difference is accounted for by reporting levels of certainty\/confidence. More on that later. <a href=\"#return-footnote-701-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-701-2\">All datasets used in this book are nationally representative data collected by Statistics Canada. <a href=\"#return-footnote-701-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><li id=\"footnote-701-3\">This is called <em>Bessel's correction<\/em>, by the name of Friedrich Bessel who introduced it. <a href=\"#return-footnote-701-3\" class=\"return-footnote\" aria-label=\"Return to footnote 3\">&crarr;<\/a><\/li><li id=\"footnote-701-4\">This is the small-case Greek letter <em>a<\/em>: <em>\u03b1<\/em>, pronounced \"AL-pha\". <a href=\"#return-footnote-701-4\" class=\"return-footnote\" aria-label=\"Return to footnote 4\">&crarr;<\/a><\/li><li id=\"footnote-701-5\">This is the small-case Greek letter <em>b<\/em>: <em>\u03b2<\/em>, pronounced \"BAY-ta\". <a href=\"#return-footnote-701-5\" class=\"return-footnote\" aria-label=\"Return to footnote 5\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":533,"menu_order":4,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-701","chapter","type-chapter","status-publish","hentry"],"part":32,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/701","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/users\/533"}],"version-history":[{"count":25,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/701\/revisions"}],"predecessor-version":[{"id":2050,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/701\/revisions\/2050"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/parts\/32"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/701\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/media?parent=701"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapter-type?post=701"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/contributor?post=701"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/license?post=701"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}