{"id":1137,"date":"2019-04-06T02:17:08","date_gmt":"2019-04-06T06:17:08","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/simplestats\/?post_type=chapter&#038;p=1137"},"modified":"2019-11-02T19:46:06","modified_gmt":"2019-11-02T23:46:06","slug":"9-1-between-a-discrete-and-a-continuous-variable","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/9-1-between-a-discrete-and-a-continuous-variable\/","title":{"raw":"9.1 Between a Discrete and a Continuous Variable: The t-test","rendered":"9.1 Between a Discrete and a Continuous Variable: The t-test"},"content":{"raw":"[latexpage]\r\n\r\nFor this part, you need to recall (from Section 7.2.1, <a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/7-2-1-between-a-discrete-and-a-continuous-variable\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/7-2-1-between-a-discrete-and-a-continuous-variable\/<\/a>) how we described bivariate associations between two variables, one of which is treated as discrete and one as continuous. In this case we essentially compared the groups (categories of the discrete variable) by their mean (or median) value on the continuous variable.\u00a0 We examine the potential association between such variables visually through boxplots and numerically through a difference of means.\r\n\r\n&nbsp;\r\n\r\nNow the question in front of us is: even if we do see a difference in the means of the different groups <em>in sample data<\/em>, how certain can we be that this association is real and reflective of the population? As we learned in Chapter 8, to answer this question, we need to test the difference for statistical significance.\r\n\r\n&nbsp;\r\n\r\nWe start with a few theoretical notes, which we will then apply to the example I used in Chapter 7 about the potential gender difference in average income. In this way we will be able to test whether the difference observed in the <em>NHS 2011<\/em> data (\\$16,401 in favour of men to be precise) is statistically significant or not. In the latter half of this section we will see what happens when there are more than two groups' means to compare.\r\n\r\n&nbsp;\r\n\r\n<strong>Testing the difference of two means.<\/strong> Recall from Section 8.3 (<a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/8-3-hypothesis-testing\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/8-3-hypothesis-testing\/<\/a>) that we tested whether the employees who took a training course indeed had a higher average productivity by simply calculating the <em>z<\/em>-value (or, using the estimated standard error, the <em>t<\/em>-value with a given <em>df<\/em>) for the mean and then finding its associated <em>p<\/em>-value. We could then compare the <em>p<\/em>-value to the preselected\u00a0<em>\u03b1<\/em>-level and make a conclusion regarding the null hypothesis.\r\n\r\n&nbsp;\r\n\r\nYou will be happy to know that testing a difference of means follows the same principle: obtain the\u00a0<em>z <\/em>(or rather, the\u00a0<em>t<\/em>-value), get the associated\u00a0<em>p<\/em>-value, compare to the\u00a0<em>\u03b1<\/em>. What is not the same is that now we are testing expressly a difference of two means -- so we need the\u00a0<em>t<\/em>-value for the<em> difference<\/em>. It turns out, we can calculate one as easily as ever, as long as we had the standard error of the <em>difference<\/em>[footnote]I hope you have not forgotten that $z=\\frac{\\overline{x}-\\mu}{\\sigma_\\overline{x}}$, where the standard error $\\sigma_\\overline{x}$ $=\\frac{\\sigma}{N}$.[\/footnote].\r\n\r\n&nbsp;\r\n\r\n<strong>The standard error of a difference of two means is a combination of their separate standard errors:<\/strong>\r\n\r\n&nbsp;\r\n\r\n$\\sigma_(\\overline{x}_1-\\overline{x}_2)$ $=\\sqrt{\\frac{\\sigma_1^2}{N_1}+\\frac{\\sigma_2^2}{N_2}}$ = <em>standard error of the difference of two means<\/em>\r\n\r\n&nbsp;\r\n\r\nwhere the subscripts refer to the first and second group being compared.\r\n\r\n&nbsp;\r\n\r\nThe <em>z<\/em>-value for a difference of two means follows the ordinary <em>z<\/em>-value formula, but with the <em>difference<\/em> taking the place of the single mean:\r\n\r\n&nbsp;\r\n\r\n$z=\\frac{(\\overline{x_1} -\\overline{x_2})-(\\mu_1 -\\mu_2 )}{\\sigma_(\\overline{x}_1-\\overline{x}_2)}$\r\n\r\n&nbsp;\r\n\r\nHowever, under the null hypothesis we hypothesize there is no difference in the population means, as such $\\mu_1=\\mu_2$, and thus $\\mu_1-\\mu_2=0$. Accounting for that in the formula, along with substituting the standard error with its own formula from above, we get:\r\n\r\n&nbsp;\r\n\r\n$z=\\frac{\\overline{x_1} -\\overline{x_2}}{\\sqrt{\\frac{\\sigma_1^2}{N_1}+\\frac{\\sigma_2^2}{N_2}}}$\r\n\r\n&nbsp;\r\n\r\nFinally, since we generally don't know the population parameters but work with sample data, we estimate the standard error\u00a0<em>\u03c3<\/em> with the sample standard error <em>s<\/em>, thus moving to the\u00a0<strong><em>t<\/em>-value\u00a0through which we test the difference for statistical significance:<\/strong>\r\n\r\n&nbsp;\r\n\r\n$t=\\frac{\\overline{x_1} -\\overline{x_2}}{\\sqrt{\\frac{s_1^2}{N_1}+\\frac{s_2^2}{N_2}}}$ = <em>t-test for the difference of means<\/em>[footnote]The more observant of you would notice that the squared standard deviations of the two groups, i.e., the\u00a0<em>s<sub>1<\/sub><sup>2<\/sup><\/em>\u00a0and s<em><sub>2<\/sub><sup>2<\/sup><\/em> here are of course the groups' variances (which we need if we are to have them under the square root). In this version of the formula, the groups are taken to have <em>unequal<\/em> variances, which is a more conservative assumption than assuming the variances of the two groups are equal. If we have a good reason to assume <em>equal<\/em> variances, then <em>s<sub>1<\/sub><sup>2\u00a0<\/sup><\/em>and <em>s<sub>2<\/sub><sup>2<\/sup><\/em>\u00a0will just be the same (combined, or pooled) variance<em> s<sup>2<\/sup><\/em>, and the formula will look like this:\r\n\r\n&nbsp;\r\n\r\n$t=\\frac{\\overline{x_1} -\\overline{x_2}}{s\\sqrt{\\frac{1}{N_1}+\\frac{1}{N_2}}}$\u00a0[\/footnote]\r\n\r\n&nbsp;\r\n\r\nNote than unlike the single value case where the <em>df=N<\/em>-1, when working with a difference of means of two groups the<span style=\"text-indent: 18.6667px;font-size: 14pt\">\u00a0<\/span><em><span style=\"text-indent: 18.6667px;font-size: 14pt\">df=N<\/span><\/em><span style=\"text-indent: 1em;font-size: 14pt\">-2.<\/span>\r\n\r\n&nbsp;\r\n\r\nBefore you eyes glaze over (completely), rest assured that SPSS calculates this for you; I only provide it here to show you that the logic of hypothesis testing is the same, only the formulas change to accommodate the testing of a <em>difference of means<\/em> rather than a single mean.\r\n\r\n&nbsp;\r\n\r\nFrom this point on, it's easy: you only need to check the <em>p<\/em>-value of the <em>t<\/em>-value you have obtained (given the specific <em>df<\/em>)[footnote]You can do that through an online <em>p<\/em>-value calculator for the <em>t<\/em>-distribution like this one here: <a href=\"https:\/\/www.socscistatistics.com\/pvalues\/tdistribution.aspx\">https:\/\/www.socscistatistics.com\/pvalues\/tdistribution.aspx<\/a>.[\/footnote], and compare it to the significance level, and <em>voila<\/em> -- you have yourself a significance test!\r\n\r\n&nbsp;\r\n\r\nLet's see how this all works out in an example.\u00a0 A few sections back I promised you to test the gender differences in average income, didn't I?\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Example 9.1 Testing Gender Differences in Average Income, NHS 2011\u00a0<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n&nbsp;\r\n\r\nAs in Example 7.2 in Section 7.2.1, I use a random sample of about 3 percent of the entire <em>NHS 2011<\/em> data, this time resulting in <em>N<\/em>=21,902[footnote]Since I use a new random sub-sample of the data, you can consider this an indirect illustration of sampling variation. For comparison of sample statistics as well as variable description, refer back to Example 7.2[\/footnote].\r\n\r\n&nbsp;\r\n\r\nWe are still interested in whether women and men on average earn differently per year, i.e., whether <em>gender<\/em> affects <em>income<\/em>:\r\n<ul>\r\n \t<li>H<sub>0<\/sub>: The average annual income of women and men is the same, $\\mu_m =\\mu_f$<\/li>\r\n \t<li>H<sub>a<\/sub>: The average annual income of women and men is different, $\\mu_m \\neq\\mu_f$<\/li>\r\n<\/ul>\r\nThere are 11,323 women (<em>N<sub>f<\/sub><\/em>=11,323) and 10,579 men (<em>N<sub>m<\/sub><\/em>=10,579) in the sample. The men earn an average of \\$48,113 ($\\overline{x}_m =48113$) and women earn an average of \\$31,519 ($\\overline{x}_f =31,529$).\u00a0 The respective standard deviations are \\$68214 for men ($s_m =68214$) and \\$34,760 for women ($s_f=34760$).\r\n\r\n&nbsp;\r\n\r\nThe difference of means is therefore:\r\n\r\n&nbsp;\r\n\r\n$\\overline{x}_m -\\overline{x}_f =48113-31519=16594$\r\n\r\n&nbsp;\r\n\r\nThe question is whether this \\$16,549 is due to sampling variation (i.e., statistically not different than a population difference of means of \\$0), or unusual enough so that a population mean of \\$0 to be unlikely (i.e., so the difference is statistically significant).\r\n\r\n&nbsp;\r\n\r\nTo test this, we need to calculate the standard error of the difference. Once we have the standard error of the difference, we can calculate the <em>t<\/em>-value.\r\n\r\n&nbsp;\r\n\r\nThe standard error of the difference is:\r\n\r\n&nbsp;\r\n\r\n$s_\\overline{x}_m-\\overline{x}_f$ = $\\sqrt{\\frac{s_m^2}{N_m}+\\frac{s_f^2}{N_f}}=\\sqrt{\\frac{68214^2}{10579}+\\frac{34760^2}{11323}}=\\sqrt{439848+106708}=739$\r\n\r\n&nbsp;\r\n\r\nThe <em>t<\/em>-value is then:\r\n\r\n&nbsp;\r\n\r\n$t=\\frac{\\overline{x}_m -\\overline{x}_f}{\\s_(\\overline{x}_m-\\overline{x}_f)}=\\frac{16594}{739}=22.446$\r\n\r\n&nbsp;\r\n\r\nGiven the large <em>N<\/em>, even just looking at the <em>t<\/em>-value should make it clear that the difference is statistically significant -- after all, in a two-tailed test, the <em>t<\/em>-value is significant at 1.96 and on (for\u00a0<em>\u03b1<\/em>=0.05) and at 2.58 and on (for\u00a0<em>\u03b1<\/em>=0.01).\r\n\r\n&nbsp;\r\n\r\nStill, this is not the way to report a test -- this is: <strong>With a <em>t<\/em>=22.447, <em>df<\/em>=21,900, and <em>p<\/em>=0.000<\/strong>[footnote]You can check this with a <em>p<\/em>-value calculator; SPSS reports it too.[\/footnote]<strong>, and <em>p<\/em>&lt;0.001<\/strong>[footnote]That is, the probability to observe a difference of \\$16,594 in the sample if there were no difference in the population is smaller than 0.1%.[\/footnote]<strong>, we have enough evidence to reject the null hypothesis. Indeed, we can conclude with 99.99% certainty that there is a statistically significant difference between the average annual income of men and women (i.e., that the difference exists in the population).<\/strong>\r\n\r\n&nbsp;\r\n\r\nWe can check this with a confidence interval too, again substituting the difference in place of a single value[footnote]I hope you remember that 95% CI: $\\overline{x} \\pm 1.96\\times s_\\overline{x}$. [\/footnote]:\r\n\r\n&nbsp;\r\n\r\n95% CI: $\\overline{x}_m - \\overline{x}_f \\pm 1.96\\times s_\\overline{x}_m-\\overline{x}_f$ = $16594\u00a0\\pm 1.96 \\times 739 = 16594 \\pm 1448$ = $= (15145; 18043)$\r\n\r\n&nbsp;\r\n\r\nThat is, we can say that <b>the difference of average annual incomes between men and women will be between \\$15,145 and \\$18,043 with 95% certainty; or that 19 out of 20 such studies will find a difference of\u00a0\\$16,594\u00a0$\\pm$ \\$1,448. (<\/b>We also see the correspondence with hypothesis testing: since the interval\u00a0does <em>not<\/em> contain 0, 0 is not a plausible value for the difference.)\r\n\r\n&nbsp;\r\n\r\nInference is not doing too badly, no?\r\n\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nAgain, SPSS will provide all the calculations but I advise you to still test your understanding of the procedure with the following exercise.\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Do It!! 9.1 Gender Differences in Age of Actors in Main Roles<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n&nbsp;\r\n\r\nStudies find that due to the gendered social construction of aging (i.e., women are considered \"older\" and \"mature\" at younger ages than men), male actors are frequently paired with much younger female actors (Buchanan 2013; Follows 2015). For example, the Oscars average age of male and female Academy Award nominees is telling: in the Best Actor category, the average age of men is 43.4 years while the average age of women is 37.2 years (Beckwith &amp; Hester, 2018 [http:\/\/thedataface.com\/2018\/03\/culture\/oscar-nominees-age]).\r\n\r\n&nbsp;\r\n\r\nLet's say that you want to investigate this phenomenon yourself. You randomly select 100 male and 100 female\u00a0<span style=\"font-size: 1rem\">academy award nominees, and calculate their age at nomination for an Academy Award. You find that men's average age is 45 years and women's is 36 years, with standard deviations of 15 years for men and 20 years for women. Test the hypothesis that the average age for women is different from that of men for the population of all Best Actor\/Actress Oscar nominees. Create a 95% CI for the difference to see its correspondence with the hypothesis test.<\/span>\r\n\r\n&nbsp;\r\n\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nNow that you understand the principle of testing the difference of two means, let's see what we can do about non-binary discrete variables, in the next section. The SPSS guidelines for doing a <em>t<\/em>-test are below.\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>SPSS Tip 9.1 The t-test<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li>From the <em>Main Menu<\/em>, select <em>Analyze<\/em>, and from the pull-down menu, click on <em>Compare Means<\/em> and <em>Independent Samples T Test<\/em>;<\/li>\r\n \t<li>Select your continuous variable from the list of variables on the left and, using the top arrow, move it to the <em>Test Variable(s)<\/em> empty space on the right;<\/li>\r\n \t<li>Select your discrete variable from the list of variables on the left and, using the bottom arrow, move it to the <em>Grouping Variable<\/em> empty space on the right;<\/li>\r\n \t<li>Click on <em>Define Groups<\/em>, and in the new window, keep <em>Use specified values<\/em> selected; in the empty spaces for <em>Group 1<\/em> and <em>Group 2<\/em>, enter the <em>numeric<\/em> values[footnote]That would be the \"code\" -- for example, <em>gender<\/em> may be coded as \"1 female, 2 male\", or \"0 male, 1 female\", etc., depending on the dataset. You have to know this beforehand; if unsure, go back to Variable View and check.[\/footnote] corresponding to the two categories of your discrete variable; click <em>Continue<\/em>.<\/li>\r\n \t<li>In the <em>Independent Samples T Test<\/em> window click <em>Options<\/em>...; you can request specific confidence interval in the new window (the default is 95%); click <em>Continue<\/em>;<\/li>\r\n \t<li>\u00a0Click <em>OK<\/em> once back to the <em>Independent Samples T Test<\/em> window.<\/li>\r\n \t<li>SPSS will produce two tables in the <em>Output<\/em> window: a <em>Group Statistics<\/em> one (where you can see sample size, the mean, standard deviation, and standard error for each group (category in the discrete variable), and an <em>Independent Samples Test<\/em>\u00a0one (where you can find the <em>t<\/em>-value, <em>df<\/em>, <em>p<\/em>-value, mean difference, standard error of the difference, and the requested confidence interval)[footnote] The table provides two versions of the test: <em>with<\/em> and <em>without<\/em> equal variances assumed. Which one you should use depends on the size of the two groups' variances. If the variance of one groups is twice (or more) as big as the other group's variance (like in Example 9.1 above, where the men's variance was much larger than the women's one), use the test results in the bottom row, \"equal variances not assumed\". If the two groups' variances are relatively similar, you can use the top row, \"equal variances assumed\". You don't have to decide on your own, as SPSS provides a convenient indication for which one is better to use, under<em> Levene's Test\/F<\/em> for comparing variances. If the <em>F<\/em>-test is significant (i.e., <em>p<\/em>\u22640.05), the variances are too different and using the bottom row is better; if the <em>F<\/em>-test is non-significant (i.e., <em>p<\/em>&gt;0.05) you can assume the variances are equal and use the top row of results.[\/footnote].<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n&nbsp;","rendered":"<p>For this part, you need to recall (from Section 7.2.1, <a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/7-2-1-between-a-discrete-and-a-continuous-variable\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/7-2-1-between-a-discrete-and-a-continuous-variable\/<\/a>) how we described bivariate associations between two variables, one of which is treated as discrete and one as continuous. In this case we essentially compared the groups (categories of the discrete variable) by their mean (or median) value on the continuous variable.\u00a0 We examine the potential association between such variables visually through boxplots and numerically through a difference of means.<\/p>\n<p>&nbsp;<\/p>\n<p>Now the question in front of us is: even if we do see a difference in the means of the different groups <em>in sample data<\/em>, how certain can we be that this association is real and reflective of the population? As we learned in Chapter 8, to answer this question, we need to test the difference for statistical significance.<\/p>\n<p>&nbsp;<\/p>\n<p>We start with a few theoretical notes, which we will then apply to the example I used in Chapter 7 about the potential gender difference in average income. In this way we will be able to test whether the difference observed in the <em>NHS 2011<\/em> data (&#36;16,401 in favour of men to be precise) is statistically significant or not. In the latter half of this section we will see what happens when there are more than two groups&#8217; means to compare.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Testing the difference of two means.<\/strong> Recall from Section 8.3 (<a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/8-3-hypothesis-testing\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/8-3-hypothesis-testing\/<\/a>) that we tested whether the employees who took a training course indeed had a higher average productivity by simply calculating the <em>z<\/em>-value (or, using the estimated standard error, the <em>t<\/em>-value with a given <em>df<\/em>) for the mean and then finding its associated <em>p<\/em>-value. We could then compare the <em>p<\/em>-value to the preselected\u00a0<em>\u03b1<\/em>-level and make a conclusion regarding the null hypothesis.<\/p>\n<p>&nbsp;<\/p>\n<p>You will be happy to know that testing a difference of means follows the same principle: obtain the\u00a0<em>z <\/em>(or rather, the\u00a0<em>t<\/em>-value), get the associated\u00a0<em>p<\/em>-value, compare to the\u00a0<em>\u03b1<\/em>. What is not the same is that now we are testing expressly a difference of two means &#8212; so we need the\u00a0<em>t<\/em>-value for the<em> difference<\/em>. It turns out, we can calculate one as easily as ever, as long as we had the standard error of the <em>difference<\/em><a class=\"footnote\" title=\"I hope you have not forgotten that , where the standard error  .\" id=\"return-footnote-1137-1\" href=\"#footnote-1137-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>The standard error of a difference of two means is a combination of their separate standard errors:<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-77108f7881ce19ec1766fa4bc9fb188b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#40;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#49;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#50;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"21\" width=\"80\" style=\"vertical-align: -7px;\" \/> <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-8d77fafe509f10e957dd1a4143a27583_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#61;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#49;&#94;&#50;&#125;&#123;&#78;&#95;&#49;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#50;&#94;&#50;&#125;&#123;&#78;&#95;&#50;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"33\" width=\"101\" style=\"vertical-align: -10px;\" \/> = <em>standard error of the difference of two means<\/em><\/p>\n<p>&nbsp;<\/p>\n<p>where the subscripts refer to the first and second group being compared.<\/p>\n<p>&nbsp;<\/p>\n<p>The <em>z<\/em>-value for a difference of two means follows the ordinary <em>z<\/em>-value formula, but with the <em>difference<\/em> taking the place of the single mean:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-1da6b9c8dbd3669f1a822fa1df8bb903_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#122;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#95;&#49;&#125;&#32;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#95;&#50;&#125;&#41;&#45;&#40;&#92;&#109;&#117;&#95;&#49;&#32;&#45;&#92;&#109;&#117;&#95;&#50;&#32;&#41;&#125;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#40;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#49;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#50;&#41;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"31\" width=\"148\" style=\"vertical-align: -11px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>However, under the null hypothesis we hypothesize there is no difference in the population means, as such <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-5c6bd7de1201a465d4d94717e0db47b6_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#117;&#95;&#49;&#61;&#92;&#109;&#117;&#95;&#50;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"60\" style=\"vertical-align: -4px;\" \/>, and thus <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-68323cf1003e8e0e4e31cda0a32f190a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#117;&#95;&#49;&#45;&#92;&#109;&#117;&#95;&#50;&#61;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"91\" style=\"vertical-align: -4px;\" \/>. Accounting for that in the formula, along with substituting the standard error with its own formula from above, we get:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-2c2252ee2e31f09b841e38169079677c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#122;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#95;&#49;&#125;&#32;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#95;&#50;&#125;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#49;&#94;&#50;&#125;&#123;&#78;&#95;&#49;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#50;&#94;&#50;&#125;&#123;&#78;&#95;&#50;&#125;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"45\" width=\"100\" style=\"vertical-align: -29px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Finally, since we generally don&#8217;t know the population parameters but work with sample data, we estimate the standard error\u00a0<em>\u03c3<\/em> with the sample standard error <em>s<\/em>, thus moving to the\u00a0<strong><em>t<\/em>-value\u00a0through which we test the difference for statistical significance:<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-0ee9ca7853742bb710eff2869087417d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#116;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#95;&#49;&#125;&#32;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#95;&#50;&#125;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#115;&#95;&#49;&#94;&#50;&#125;&#123;&#78;&#95;&#49;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#115;&#95;&#50;&#94;&#50;&#125;&#123;&#78;&#95;&#50;&#125;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"45\" width=\"97\" style=\"vertical-align: -29px;\" \/> = <em>t-test for the difference of means<\/em><a class=\"footnote\" title=\"The more observant of you would notice that the squared standard deviations of the two groups, i.e., the\u00a0s12\u00a0and s22 here are of course the groups' variances (which we need if we are to have them under the square root). In this version of the formula, the groups are taken to have unequal variances, which is a more conservative assumption than assuming the variances of the two groups are equal. If we have a good reason to assume equal variances, then s12\u00a0and s22\u00a0will just be the same (combined, or pooled) variance s2, and the formula will look like this:\n\n\u00a0\n\n\u00a0\" id=\"return-footnote-1137-2\" href=\"#footnote-1137-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Note than unlike the single value case where the <em>df=N<\/em>-1, when working with a difference of means of two groups the<span style=\"text-indent: 18.6667px;font-size: 14pt\">\u00a0<\/span><em><span style=\"text-indent: 18.6667px;font-size: 14pt\">df=N<\/span><\/em><span style=\"text-indent: 1em;font-size: 14pt\">-2.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>Before you eyes glaze over (completely), rest assured that SPSS calculates this for you; I only provide it here to show you that the logic of hypothesis testing is the same, only the formulas change to accommodate the testing of a <em>difference of means<\/em> rather than a single mean.<\/p>\n<p>&nbsp;<\/p>\n<p>From this point on, it&#8217;s easy: you only need to check the <em>p<\/em>-value of the <em>t<\/em>-value you have obtained (given the specific <em>df<\/em>)<a class=\"footnote\" title=\"You can do that through an online p-value calculator for the t-distribution like this one here: https:\/\/www.socscistatistics.com\/pvalues\/tdistribution.aspx.\" id=\"return-footnote-1137-3\" href=\"#footnote-1137-3\" aria-label=\"Footnote 3\"><sup class=\"footnote\">[3]<\/sup><\/a>, and compare it to the significance level, and <em>voila<\/em> &#8212; you have yourself a significance test!<\/p>\n<p>&nbsp;<\/p>\n<p>Let&#8217;s see how this all works out in an example.\u00a0 A few sections back I promised you to test the gender differences in average income, didn&#8217;t I?<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Example 9.1 Testing Gender Differences in Average Income, NHS 2011\u00a0<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>&nbsp;<\/p>\n<p>As in Example 7.2 in Section 7.2.1, I use a random sample of about 3 percent of the entire <em>NHS 2011<\/em> data, this time resulting in <em>N<\/em>=21,902<a class=\"footnote\" title=\"Since I use a new random sub-sample of the data, you can consider this an indirect illustration of sampling variation. For comparison of sample statistics as well as variable description, refer back to Example 7.2\" id=\"return-footnote-1137-4\" href=\"#footnote-1137-4\" aria-label=\"Footnote 4\"><sup class=\"footnote\">[4]<\/sup><\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p>We are still interested in whether women and men on average earn differently per year, i.e., whether <em>gender<\/em> affects <em>income<\/em>:<\/p>\n<ul>\n<li>H<sub>0<\/sub>: The average annual income of women and men is the same, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-028bfaba37d400b051e55fdb48005b02_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#117;&#95;&#109;&#32;&#61;&#92;&#109;&#117;&#95;&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"66\" style=\"vertical-align: -6px;\" \/><\/li>\n<li>H<sub>a<\/sub>: The average annual income of women and men is different, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-d863504889fc2cc3ad0a862f93b92fe6_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#117;&#95;&#109;&#32;&#92;&#110;&#101;&#113;&#92;&#109;&#117;&#95;&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"66\" style=\"vertical-align: -6px;\" \/><\/li>\n<\/ul>\n<p>There are 11,323 women (<em>N<sub>f<\/sub><\/em>=11,323) and 10,579 men (<em>N<sub>m<\/sub><\/em>=10,579) in the sample. The men earn an average of &#36;48,113 (<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-b9caa8111afec3cbdf29141a7bd91203_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#109;&#32;&#61;&#52;&#56;&#49;&#49;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"92\" style=\"vertical-align: -3px;\" \/>) and women earn an average of &#36;31,519 (<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-c115e6a516f5df6ecbcb9877dc1db6ff_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#102;&#32;&#61;&#51;&#49;&#44;&#53;&#50;&#57;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"95\" style=\"vertical-align: -6px;\" \/>).\u00a0 The respective standard deviations are &#36;68214 for men (<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-84ec7ef0c9d51a37ad85b8d164eb449a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#115;&#95;&#109;&#32;&#61;&#54;&#56;&#50;&#49;&#52;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"90\" style=\"vertical-align: -3px;\" \/>) and &#36;34,760 for women (<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-e3de2a2b6bf43bdd8b49c4610e30cb67_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#115;&#95;&#102;&#61;&#51;&#52;&#55;&#54;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"86\" style=\"vertical-align: -6px;\" \/>).<\/p>\n<p>&nbsp;<\/p>\n<p>The difference of means is therefore:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-6834ae006b60030debd3cd21a3534aba_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#109;&#32;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#102;&#32;&#61;&#52;&#56;&#49;&#49;&#51;&#45;&#51;&#49;&#53;&#49;&#57;&#61;&#49;&#54;&#53;&#57;&#52;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"266\" style=\"vertical-align: -6px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The question is whether this &#36;16,549 is due to sampling variation (i.e., statistically not different than a population difference of means of &#36;0), or unusual enough so that a population mean of &#36;0 to be unlikely (i.e., so the difference is statistically significant).<\/p>\n<p>&nbsp;<\/p>\n<p>To test this, we need to calculate the standard error of the difference. Once we have the standard error of the difference, we can calculate the <em>t<\/em>-value.<\/p>\n<p>&nbsp;<\/p>\n<p>The standard error of the difference is:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-fdd8db7f68c1c5b6177b3dcaa4a98f9d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#115;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#109;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"53\" style=\"vertical-align: -7px;\" \/> = <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-30cebb13cd224c24393cbc392243adbb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#115;&#95;&#109;&#94;&#50;&#125;&#123;&#78;&#95;&#109;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#115;&#95;&#102;&#94;&#50;&#125;&#123;&#78;&#95;&#102;&#125;&#125;&#61;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#54;&#56;&#50;&#49;&#52;&#94;&#50;&#125;&#123;&#49;&#48;&#53;&#55;&#57;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#52;&#55;&#54;&#48;&#94;&#50;&#125;&#123;&#49;&#49;&#51;&#50;&#51;&#125;&#125;&#61;&#92;&#115;&#113;&#114;&#116;&#123;&#52;&#51;&#57;&#56;&#52;&#56;&#43;&#49;&#48;&#54;&#55;&#48;&#56;&#125;&#61;&#55;&#51;&#57;\" title=\"Rendered by QuickLaTeX.com\" height=\"43\" width=\"457\" style=\"vertical-align: -14px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The <em>t<\/em>-value is then:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-e3311bda09abaed4d5336232e43b7bf8_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#116;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#109;&#32;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#102;&#125;&#123;&#92;&#115;&#95;&#40;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#109;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#102;&#41;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#54;&#53;&#57;&#52;&#125;&#123;&#55;&#51;&#57;&#125;&#61;&#50;&#50;&#46;&#52;&#52;&#54;\" title=\"Rendered by QuickLaTeX.com\" height=\"29\" width=\"226\" style=\"vertical-align: -11px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Given the large <em>N<\/em>, even just looking at the <em>t<\/em>-value should make it clear that the difference is statistically significant &#8212; after all, in a two-tailed test, the <em>t<\/em>-value is significant at 1.96 and on (for\u00a0<em>\u03b1<\/em>=0.05) and at 2.58 and on (for\u00a0<em>\u03b1<\/em>=0.01).<\/p>\n<p>&nbsp;<\/p>\n<p>Still, this is not the way to report a test &#8212; this is: <strong>With a <em>t<\/em>=22.447, <em>df<\/em>=21,900, and <em>p<\/em>=0.000<\/strong><a class=\"footnote\" title=\"You can check this with a p-value calculator; SPSS reports it too.\" id=\"return-footnote-1137-5\" href=\"#footnote-1137-5\" aria-label=\"Footnote 5\"><sup class=\"footnote\">[5]<\/sup><\/a><strong>, and <em>p<\/em>&lt;0.001<\/strong><a class=\"footnote\" title=\"That is, the probability to observe a difference of $16,594 in the sample if there were no difference in the population is smaller than 0.1%.\" id=\"return-footnote-1137-6\" href=\"#footnote-1137-6\" aria-label=\"Footnote 6\"><sup class=\"footnote\">[6]<\/sup><\/a><strong>, we have enough evidence to reject the null hypothesis. Indeed, we can conclude with 99.99% certainty that there is a statistically significant difference between the average annual income of men and women (i.e., that the difference exists in the population).<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>We can check this with a confidence interval too, again substituting the difference in place of a single value<a class=\"footnote\" title=\"I hope you remember that 95% CI: .\" id=\"return-footnote-1137-7\" href=\"#footnote-1137-7\" aria-label=\"Footnote 7\"><sup class=\"footnote\">[7]<\/sup><\/a>:<\/p>\n<p>&nbsp;<\/p>\n<p>95% CI: <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-c2f6abae14fbbcfb7644e5e12578cb7b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#109;&#32;&#45;&#32;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#102;&#32;&#92;&#112;&#109;&#32;&#49;&#46;&#57;&#54;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#115;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#109;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#95;&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"192\" style=\"vertical-align: -7px;\" \/> = <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-e6ea2d087da28f9014283702d0fa8da5_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#49;&#54;&#53;&#57;&#52;&#32;&#92;&#112;&#109;&#32;&#49;&#46;&#57;&#54;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#55;&#51;&#57;&#32;&#61;&#32;&#49;&#54;&#53;&#57;&#52;&#32;&#92;&#112;&#109;&#32;&#49;&#52;&#52;&#56;\" title=\"Rendered by QuickLaTeX.com\" height=\"14\" width=\"270\" style=\"vertical-align: -1px;\" \/> = <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-3a7f0196c0994688e3d26648211efb41_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#61;&#32;&#40;&#49;&#53;&#49;&#52;&#53;&#59;&#32;&#49;&#56;&#48;&#52;&#51;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"129\" style=\"vertical-align: -4px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>That is, we can say that <b>the difference of average annual incomes between men and women will be between &#36;15,145 and &#36;18,043 with 95% certainty; or that 19 out of 20 such studies will find a difference of\u00a0&#36;16,594\u00a0<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-7e6df2d33e1750d02a2b0cb14612da8f_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#112;&#109;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"13\" style=\"vertical-align: 0px;\" \/> &#36;1,448. (<\/b>We also see the correspondence with hypothesis testing: since the interval\u00a0does <em>not<\/em> contain 0, 0 is not a plausible value for the difference.)<\/p>\n<p>&nbsp;<\/p>\n<p>Inference is not doing too badly, no?<\/p>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Again, SPSS will provide all the calculations but I advise you to still test your understanding of the procedure with the following exercise.<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Do It!! 9.1 Gender Differences in Age of Actors in Main Roles<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>&nbsp;<\/p>\n<p>Studies find that due to the gendered social construction of aging (i.e., women are considered &#8220;older&#8221; and &#8220;mature&#8221; at younger ages than men), male actors are frequently paired with much younger female actors (Buchanan 2013; Follows 2015). For example, the Oscars average age of male and female Academy Award nominees is telling: in the Best Actor category, the average age of men is 43.4 years while the average age of women is 37.2 years (Beckwith &amp; Hester, 2018 [http:\/\/thedataface.com\/2018\/03\/culture\/oscar-nominees-age]).<\/p>\n<p>&nbsp;<\/p>\n<p>Let&#8217;s say that you want to investigate this phenomenon yourself. You randomly select 100 male and 100 female\u00a0<span style=\"font-size: 1rem\">academy award nominees, and calculate their age at nomination for an Academy Award. You find that men&#8217;s average age is 45 years and women&#8217;s is 36 years, with standard deviations of 15 years for men and 20 years for women. Test the hypothesis that the average age for women is different from that of men for the population of all Best Actor\/Actress Oscar nominees. Create a 95% CI for the difference to see its correspondence with the hypothesis test.<\/span><\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Now that you understand the principle of testing the difference of two means, let&#8217;s see what we can do about non-binary discrete variables, in the next section. The SPSS guidelines for doing a <em>t<\/em>-test are below.<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>SPSS Tip 9.1 The t-test<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li>From the <em>Main Menu<\/em>, select <em>Analyze<\/em>, and from the pull-down menu, click on <em>Compare Means<\/em> and <em>Independent Samples T Test<\/em>;<\/li>\n<li>Select your continuous variable from the list of variables on the left and, using the top arrow, move it to the <em>Test Variable(s)<\/em> empty space on the right;<\/li>\n<li>Select your discrete variable from the list of variables on the left and, using the bottom arrow, move it to the <em>Grouping Variable<\/em> empty space on the right;<\/li>\n<li>Click on <em>Define Groups<\/em>, and in the new window, keep <em>Use specified values<\/em> selected; in the empty spaces for <em>Group 1<\/em> and <em>Group 2<\/em>, enter the <em>numeric<\/em> values<a class=\"footnote\" title=\"That would be the &quot;code&quot; -- for example, gender may be coded as &quot;1 female, 2 male&quot;, or &quot;0 male, 1 female&quot;, etc., depending on the dataset. You have to know this beforehand; if unsure, go back to Variable View and check.\" id=\"return-footnote-1137-8\" href=\"#footnote-1137-8\" aria-label=\"Footnote 8\"><sup class=\"footnote\">[8]<\/sup><\/a> corresponding to the two categories of your discrete variable; click <em>Continue<\/em>.<\/li>\n<li>In the <em>Independent Samples T Test<\/em> window click <em>Options<\/em>&#8230;; you can request specific confidence interval in the new window (the default is 95%); click <em>Continue<\/em>;<\/li>\n<li>\u00a0Click <em>OK<\/em> once back to the <em>Independent Samples T Test<\/em> window.<\/li>\n<li>SPSS will produce two tables in the <em>Output<\/em> window: a <em>Group Statistics<\/em> one (where you can see sample size, the mean, standard deviation, and standard error for each group (category in the discrete variable), and an <em>Independent Samples Test<\/em>\u00a0one (where you can find the <em>t<\/em>-value, <em>df<\/em>, <em>p<\/em>-value, mean difference, standard error of the difference, and the requested confidence interval)<a class=\"footnote\" title=\"The table provides two versions of the test: with and without equal variances assumed. Which one you should use depends on the size of the two groups' variances. If the variance of one groups is twice (or more) as big as the other group's variance (like in Example 9.1 above, where the men's variance was much larger than the women's one), use the test results in the bottom row, &quot;equal variances not assumed&quot;. If the two groups' variances are relatively similar, you can use the top row, &quot;equal variances assumed&quot;. You don't have to decide on your own, as SPSS provides a convenient indication for which one is better to use, under Levene's Test\/F for comparing variances. If the F-test is significant (i.e., p\u22640.05), the variances are too different and using the bottom row is better; if the F-test is non-significant (i.e., p&gt;0.05) you can assume the variances are equal and use the top row of results.\" id=\"return-footnote-1137-9\" href=\"#footnote-1137-9\" aria-label=\"Footnote 9\"><sup class=\"footnote\">[9]<\/sup><\/a>.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-1137-1\">I hope you have not forgotten that <img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-d7b246c692b62bc4c366d4720b038763_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#122;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#45;&#92;&#109;&#117;&#125;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"24\" width=\"62\" style=\"vertical-align: -8px;\" \/>, where the standard error <img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-de5382c00a55332dd89774492d104d0c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"19\" style=\"vertical-align: -3px;\" \/> <img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-37f5b9900520c65e863b86a3c1f7ce1f_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#115;&#105;&#103;&#109;&#97;&#125;&#123;&#78;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"34\" style=\"vertical-align: -6px;\" \/>. <a href=\"#return-footnote-1137-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-1137-2\">The more observant of you would notice that the squared standard deviations of the two groups, i.e., the\u00a0<em>s<sub>1<\/sub><sup>2<\/sup><\/em>\u00a0and s<em><sub>2<\/sub><sup>2<\/sup><\/em> here are of course the groups' variances (which we need if we are to have them under the square root). In this version of the formula, the groups are taken to have <em>unequal<\/em> variances, which is a more conservative assumption than assuming the variances of the two groups are equal. If we have a good reason to assume <em>equal<\/em> variances, then <em>s<sub>1<\/sub><sup>2\u00a0<\/sup><\/em>and <em>s<sub>2<\/sub><sup>2<\/sup><\/em>\u00a0will just be the same (combined, or pooled) variance<em> s<sup>2<\/sup><\/em>, and the formula will look like this:\r\n\r\n&nbsp;\r\n\r\n<img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-dd0ea86d993781d12de4a812e7ba4d99_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#116;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#95;&#49;&#125;&#32;&#45;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#95;&#50;&#125;&#125;&#123;&#115;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#78;&#95;&#49;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#78;&#95;&#50;&#125;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"37\" width=\"103\" style=\"vertical-align: -21px;\" \/>\u00a0 <a href=\"#return-footnote-1137-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><li id=\"footnote-1137-3\">You can do that through an online <em>p<\/em>-value calculator for the <em>t<\/em>-distribution like this one here: <a href=\"https:\/\/www.socscistatistics.com\/pvalues\/tdistribution.aspx\">https:\/\/www.socscistatistics.com\/pvalues\/tdistribution.aspx<\/a>. <a href=\"#return-footnote-1137-3\" class=\"return-footnote\" aria-label=\"Return to footnote 3\">&crarr;<\/a><\/li><li id=\"footnote-1137-4\">Since I use a new random sub-sample of the data, you can consider this an indirect illustration of sampling variation. For comparison of sample statistics as well as variable description, refer back to Example 7.2 <a href=\"#return-footnote-1137-4\" class=\"return-footnote\" aria-label=\"Return to footnote 4\">&crarr;<\/a><\/li><li id=\"footnote-1137-5\">You can check this with a <em>p<\/em>-value calculator; SPSS reports it too. <a href=\"#return-footnote-1137-5\" class=\"return-footnote\" aria-label=\"Return to footnote 5\">&crarr;<\/a><\/li><li id=\"footnote-1137-6\">That is, the probability to observe a difference of &#36;16,594 in the sample if there were no difference in the population is smaller than 0.1%. <a href=\"#return-footnote-1137-6\" class=\"return-footnote\" aria-label=\"Return to footnote 6\">&crarr;<\/a><\/li><li id=\"footnote-1137-7\">I hope you remember that 95% CI: <img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-69c9c8db1bffaad9fdbe7361c0370d0c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;&#32;&#92;&#112;&#109;&#32;&#49;&#46;&#57;&#54;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#115;&#95;&#92;&#111;&#118;&#101;&#114;&#108;&#105;&#110;&#101;&#123;&#120;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"15\" width=\"102\" style=\"vertical-align: -3px;\" \/>.  <a href=\"#return-footnote-1137-7\" class=\"return-footnote\" aria-label=\"Return to footnote 7\">&crarr;<\/a><\/li><li id=\"footnote-1137-8\">That would be the \"code\" -- for example, <em>gender<\/em> may be coded as \"1 female, 2 male\", or \"0 male, 1 female\", etc., depending on the dataset. You have to know this beforehand; if unsure, go back to Variable View and check. <a href=\"#return-footnote-1137-8\" class=\"return-footnote\" aria-label=\"Return to footnote 8\">&crarr;<\/a><\/li><li id=\"footnote-1137-9\"> The table provides two versions of the test: <em>with<\/em> and <em>without<\/em> equal variances assumed. Which one you should use depends on the size of the two groups' variances. If the variance of one groups is twice (or more) as big as the other group's variance (like in Example 9.1 above, where the men's variance was much larger than the women's one), use the test results in the bottom row, \"equal variances not assumed\". If the two groups' variances are relatively similar, you can use the top row, \"equal variances assumed\". You don't have to decide on your own, as SPSS provides a convenient indication for which one is better to use, under<em> Levene's Test\/F<\/em> for comparing variances. If the <em>F<\/em>-test is significant (i.e., <em>p<\/em>\u22640.05), the variances are too different and using the bottom row is better; if the <em>F<\/em>-test is non-significant (i.e., <em>p<\/em>&gt;0.05) you can assume the variances are equal and use the top row of results. <a href=\"#return-footnote-1137-9\" class=\"return-footnote\" aria-label=\"Return to footnote 9\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":533,"menu_order":1,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-1137","chapter","type-chapter","status-publish","hentry"],"part":120,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/1137","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/users\/533"}],"version-history":[{"count":25,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/1137\/revisions"}],"predecessor-version":[{"id":2108,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/1137\/revisions\/2108"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/parts\/120"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/1137\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/media?parent=1137"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapter-type?post=1137"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/contributor?post=1137"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/license?post=1137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}