{"id":126,"date":"2018-10-31T18:05:15","date_gmt":"2018-10-31T22:05:15","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/simplestats\/?post_type=chapter&#038;p=126"},"modified":"2019-11-02T21:14:17","modified_gmt":"2019-11-03T01:14:17","slug":"9-3-the-chi-square","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/9-3-the-chi-square\/","title":{"raw":"9.3 Between Two Discrete Variables: The \u03c72, Part 1","rendered":"9.3 Between Two Discrete Variables: The \u03c72, Part 1"},"content":{"raw":"[latexpage]\r\n\r\nAs in the previous section, here you need to recall how we examine potential association between two variables both treated as discrete (Section 7.2.2, <a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/7-2-2-between-two-discrete-variables\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/7-2-2-between-two-discrete-variables\/<\/a>). We described such associations through contingency tables, reporting differences of proportions as appropriate.\r\n\r\n&nbsp;\r\n\r\nWe can start with the simplest, binary case: when the discrete variables have two groups each. Then we compare the groups of interest (categories of one variable) on one of the categories of the other variable. (The example in Chapter 7 we used was to compare the percentage of first-year students who like the campus cafeteria to the percentage of second-year students who do.)\r\n\r\n&nbsp;\r\n\r\n<strong>The <em>t<\/em>-test for testing difference of <em>two<\/em> proportions.<\/strong> When we have only two proportions (or percentages) to compare, we can actually use the same <em>t<\/em>-test we used for testing differences of means, again treating the<em> difference<\/em> as a single, normally distributed statistic. Since we have categorical variables, however, and no standard deviations\/variances, we resort to measuring population variability by \u03c0(1-\u03c0) and sample variability by\u00a0<em>p<\/em>(1-<em>p<\/em>)[footnote]Do not forget that <em>p<\/em> here stands for <em>proportion<\/em>, not <em>probability\/p<\/em>-<em>value<\/em>.[\/footnote]. (See Section 6.7.2, <a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/6-7-2-confidence-intervals-for-proportions\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/6-7-2-confidence-intervals-for-proportions\/<\/a>.) We can thus simply substitute that into the formula for <em>z<\/em>:\r\n\r\n&nbsp;\r\n\r\n$z=\\frac{(p_1 -p_2)-(\\pi_1 -\\pi_2 )}{\\sqrt{\\frac{\\pi_1(1-\\pi_1)}{N_1}+\\frac{\\pi_2(1-\\pi_2)}{N_2}}}$\r\n\r\n&nbsp;\r\n\r\nwhere, of course, under the null hypothesis $(\\pi_1 -\\pi_2 )=0$. Then, using the sample proportions leaves us with <em>t<\/em>:\r\n\r\n&nbsp;\r\n\r\n$t=\\frac{(p_1 -p_2)}{\\sqrt{\\frac{p_1(1-p_1)}{N_1}+\\frac{p_2(1-p_2)}{N_2}}}$\r\n\r\n&nbsp;\r\n\r\nAgain, under the null hypothesis the two groups' proportions are assumed to be the same so effectively we have:\r\n\r\n&nbsp;\r\n\r\n$t=\\frac{(p_1 -p_2)}{\\sqrt{p(1-p)(\\frac{1}{N_1}+\\frac{2}{N_2})}}$\r\n\r\n&nbsp;\r\n\r\nLet's revisit the cafeteria-preferences example from Section 7.2.2 to see how the <em>t<\/em>-test for testing difference of proportions works.\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Example 9.3 Do You Like the Campus Cafeteria? (A t-Test)<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n&nbsp;\r\n\r\nIn Chapter 7 we imagined that you asked 35 students in your class[footnote]Note that this of course is not a random sample; we are using it here only for illustrating how hypothesis testing works so we are effectively pretending it is random. In a real-life study, you should not use non-probability samples for statistical inference.[\/footnote] whether they liked the campus cafeteria: 12 of your classmates said <em>yes<\/em> (i.e., 34.3 percent), 7 (out of 15) first-years and 5 (out of 20) second-years (46.7 percent of all first-years and 25 percent of all second-years, respectively).\r\n\r\n&nbsp;\r\n\r\nWe want to know whether the observed in the sample difference in proportions (0.467-0.25=0.217) is statistically significant: can it be generlized to a larger student population, or is it due to a regular sampling variability?\r\n<ul>\r\n \t<li>H<sub>0<\/sub>: The proportion of first year students who like the cafeteria is the same as the proportion of second year students who do; $\\pi_1=\\pi_2$.<\/li>\r\n \t<li>H<sub>a<\/sub>:\u00a0The proportion of first year students who like the cafeteria is different than the proportion of second year students who do; $\\pi_1\\neq\\pi_2$.<\/li>\r\n<\/ul>\r\nSubstituting these numbers in the formula we have:\r\n\r\n&nbsp;\r\n\r\n$t=\\frac{(p_1 -p_2)}{\\sqrt{p(1-p)(\\frac{1}{N_1}+\\frac{2}{N_2})}}=\\frac{0.467-0.25}{\\sqrt{0.343(1-343)(\\frac{1}{15}+\\frac{1}{20})}}=\\frac{0.217}{0.162}=1.34$\r\n\r\n&nbsp;\r\n\r\n<strong>With a <em>t<\/em>=1.34, <em>df<\/em>=34, and <em>p<\/em>=0.189 (i.e., <em>p<\/em>&gt;0.05) we <em>fail<\/em> to reject the null hypothesis: at this point we do not have enough evidence to conclude there is a difference between the proportions of first and second year students who like the campus cafeteria. The 21.7 percentage points difference is not statistically significant, and has a high enough probability of being due to random chance<\/strong>.\r\n\r\n&nbsp;\r\n\r\nWe can check this with a confidence interval too:\r\n<ul>\r\n \t<li>95% CI: $(p_1 -p_2)\\pm1.96\\times\\sqrt{\\frac{p_1(1-p_1)}{N_1}+\\frac{p_2(1-p_2)}{N_2}}=0.217\\pm1.96\\times\\sqrt{\\frac{0.467(0.533)}{15}+\\frac{0.25(0.75)}{20}}=0.217\\pm0.316=(-0.099; 0.533)$<\/li>\r\n<\/ul>\r\n<strong>In other words, the difference between the proportion of first years and the proportion of second years who like the cafeteria could be anywhere between -9.9 percentage points and 53.3 percentage points with 95% confidence (or 19 out of 20 such samples will have a difference within this pretty large interval).<\/strong> The difference can be in favour of second years or in favour of the first years (notice the negative lower bound); it can even be 0. Thus, <strong>since a difference of 0 (i.e., no difference) is a plausible value, we cannot reject the null hypothesis. We conclude that we do not have enough evidence of an association between year of study and opinion on the campus cafeteria.<\/strong>\r\n\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nAdmittedly, the formulas look scary but if you have followed through the example above, you have seen by now the actual calculation is quite simple. You can try it out and see for yourself.\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Do It! 9.2 Vegetarianism\/Veganism among Canadian and International Students<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n&nbsp;\r\n\r\nImagine you are interested in exploring whether there is a difference between Canadian and international students in your university when it comes to dietary preferences like vegetarianism and veganism. With your institution's registrar's assistance, you take a random sample of 100 students and poll them on 1) whether they are a Canadian or an international student, and 2) whether they are vegetarian\/vegan or not.\r\n\r\n&nbsp;\r\n\r\nYou find that you have 70 Canadian and 30 international students in your sample. Out of the Canadian students, 15 (or 21.4 percent) are vegetarian or vegan; out of the international students 5 (or 16.7 percent) have such dietary restrictions.\r\n\r\n&nbsp;\r\n\r\nCheck if the observed <em>in the sample<\/em> difference in proportions is generalizable to the larger student population by testing the hypothesis whether dietary preferences are associated with country of origin. Create a 95% confidence interval for that difference, and substantively interpret what you have found with both the <em>t<\/em>-test and the confidence interval.\r\n\r\n&nbsp;\r\n\r\n<sub>Useful hint 1: Among the 100, there are 20 vegan\/vegetarian students in total.<\/sub>\r\n\r\n<sub>Useful hint 2: You can find the <em>p<\/em>-value of your <em>t<\/em>-statistic here: <a href=\"https:\/\/www.socscistatistics.com\/pvalues\/tdistribution.aspx\">https:\/\/www.socscistatistics.com\/pvalues\/tdistribution.aspx<\/a>.<\/sub>\r\n\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nOf course, discrete variables do not have to be binary: they can have more than two categories each. Just like in the case of a continuous and a discrete variables' association discussed in the previous section where non-binary variables required the use of an <em>F<\/em>-test, there is a different test for testing the association between any two discrete variables, regardless of their respective number of categories (i.e., not just binary ones).\r\n\r\n&nbsp;\r\n\r\n<strong>The\u00a0<em>\u03c7<sup>2<\/sup><\/em>-test for testing associations between discrete variables.\u00a0<\/strong>The\u00a0<em>\u03c7<sup>2<\/sup><\/em>-test[footnote]This is the small-case Greek letter <em>h<\/em>, <em>\u03c7<\/em>. <em>It is pronounced [KHAI]<\/em>, but since it is transliterated as <em>chi<\/em>, many people incorrectly pronounce it as [CHAI] or even [CHEE]. The test itself is called chi-squared test (again, pronounced as [KHAI- squared] not [CHAI- or CHEE-squared]).[\/footnote] (or Pearson's\u00a0<em>\u03c7<sup>2<\/sup><\/em>-test) is based on <strong>a comparison between the\u00a0<em>observed<\/em> and the\u00a0<em>expected<\/em> cell values in a contingency table.<\/strong>\r\n\r\n&nbsp;\r\n\r\nThe observed values are the cell counts you see in a contingency table given a specific dataset. The expected values, on the other hand, are the counts we would <em>expect<\/em> to see <em>if there were no pattern\/association in the data<\/em>. In other words, the test effectively compares the sample to a null-hypothesis-like hypothetical distribution of the observations across the cells. Thus, logically, <strong>if there is a relatively large difference between the observed and the expected values, we can take that as evidence <em>against<\/em> the null hypothesis and reject it. If, however, the difference between observed and expected values is relatively small, the evidence against the null hypothesis will be insufficient and we would <em>fail<\/em> to reject it.<\/strong>\r\n\r\n&nbsp;\r\n\r\nThe actual way the\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>is calculated is this:\r\n\r\n&nbsp;\r\n\r\n$$\\chi^2=\\Sigma\\frac{(f_o -f_e)^2}{f_e}$$\r\n\r\n&nbsp;\r\n\r\nwhere <em>f<sub>o<\/sub><\/em> is the observed frequency (count) and <em>f<sub>e<\/sub><\/em> is the expected frequency count of a given cell.\r\n\r\n&nbsp;\r\n\r\nThe formula looks more complicated than it is (don't they always?) -- it only asks us to calculate the difference between the observed and the expected count <em>for each cell<\/em>, square it and divide it by the expected count.\u00a0 Once we have done this for all cells, we need only add the resulting numbers together to get the\u00a0<em>\u03c7<sup>2<\/sup><\/em>\u00a0.\r\n\r\n&nbsp;\r\n\r\nConsidering that the\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>is then a sum of as many numbers as there are cells, the larger the table (i.e., the more rows and columns there are), the bigger the resulting\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>will be. To account for that, the\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>too has degrees of freedom, where the <em>df<\/em>=(<em>rows<\/em>-1)(<em>columns<\/em>-1). The\u00a0<em>\u03c7<sup>2\u00a0<\/sup><\/em>follows a\u00a0<em>\u03c7<sup>2<\/sup>-<\/em>distribution, which too provides a <em>p<\/em>-value given specific <em>df<\/em>.\r\n\r\n&nbsp;\r\n\r\n<strong>The hypothesis testing then follows the same steps as the <em>t<\/em>-test and the <em>F<\/em>-test: obtain\u00a0<em>\u03c7<sup>2<\/sup><\/em>-value with specific <em>df,\u00a0<\/em>find its associated\u00a0<em>p<\/em>-value, and finally compare the <em>p<\/em>-value to the pre-selected significance level. If <em>p<\/em>&lt;<em>\u03b1<\/em>, reject the null hypothesis.<\/strong>\r\n\r\n&nbsp;\r\n\r\nTo demonstrate, we will first do a <em>one-way<\/em>\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>calculation, i.e., based on the frequency distribution of just <em>one<\/em> variable. (Of course, if tabulated, this would not be considered a contingency table but a frequency table.)\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Example 9.4 Do You Like The Campus Cafeteria? (Univariate \u03c7<sup>2<\/sup>-Test)<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n&nbsp;\r\n\r\nTo use the imaginary data from before, we had 12 people who admitted liking the campus cafeteria food out of the 35 polled. (Since we are interested only in one of the variables, here we ignore whether the students who like the cafeteria are first- or second-years.) As such, we have the following table:\r\n\r\n&nbsp;\r\n\r\n<em>Table 9.1\u00a0 Approval of the Campus Cafeteria, Observed Count (Univariate)\u00a0\u00a0<\/em>\r\n<table class=\"lines\" style=\"border-collapse: collapse;width: 50.2841%;height: 85px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 2.83286%;height: 15px\"><strong>Yes<\/strong><\/td>\r\n<td style=\"width: 2.83286%;height: 15px\">12<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 2.83286%;height: 15px\"><strong>No<\/strong><\/td>\r\n<td style=\"width: 2.83286%;height: 15px\">23<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 2.83286%;height: 15px\"><strong>Total<\/strong><\/td>\r\n<td style=\"width: 2.83286%;height: 15px\">35<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n&nbsp;\r\n\r\nIf you did not know anything about the campus cafeteria and had no observations about it whatsoever -- i.e., had you been an impartial observer, as it were -- wouldn't you expect to see an approximately 50\/50 split of the 35 students into the two categories? After all, there are only two groups, and an unbiased (random) distribution would be exactly like everyone flipping a coin as a manner of deciding in which group they end up. Thus, <strong>the expected count here is simply <em>N<\/em> divided by the number of groups\/categories<\/strong> (denoted by <em>k<\/em>):\r\n\r\n&nbsp;\r\n\r\n$f_e=\\frac{N}{k}=\\frac{35}{2}=17.5$\r\n\r\n&nbsp;\r\n\r\nTable 9.2 adds the expected count in brackets next to the observed count.\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n<em>Table 9.2 Approval of the Campus Cafeteria, Observed and Expected<\/em>\u00a0<em>Count (Univariate)<\/em>\r\n<table class=\"lines\" style=\"border-collapse: collapse;width: 50.2841%;height: 85px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 2.83286%;height: 15px\"><strong>Yes<\/strong><\/td>\r\n<td style=\"width: 2.83286%;height: 15px\">12\u00a0 \u00a0 (17.5)<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 2.83286%;height: 15px\"><strong>No<\/strong><\/td>\r\n<td style=\"width: 2.83286%;height: 15px\">23\u00a0 \u00a0 (17.5)<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 2.83286%;height: 15px\"><strong>Total<\/strong><\/td>\r\n<td style=\"width: 2.83286%;height: 15px\">35<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n&nbsp;\r\n\r\nThen, according to the formula, this is what we have for each of the two groups:\r\n<ul>\r\n \t<li><em>Yes<\/em>-group: $\\frac{(f_o-f_e)^2}{f_e}=\\frac{(12-17.5)^2}{17.5}=\\frac{30.25}{17.5}=1.73$<\/li>\r\n \t<li><em>No<\/em>-group: $\\frac{(f_o-f_e)^2}{f_e}=\\frac{(23-17.5)^2}{17.5}=\\frac{30.25}{17.5}=1.73$<\/li>\r\n<\/ul>\r\nFinally, to get the<em>\u00a0\u03c7<sup>2<\/sup> <\/em>we only need to add these two numbers together:\r\n\r\n&nbsp;\r\n\r\n$\\chi^2=\\Sigma\\frac{(f_o -f_e)^2}{f_e}=\u00a0\\frac{(12-17.5)^2}{17.5}+\\frac{(23-17.5)^2}{17.5}=1.73+1.73=3.46$\r\n\r\n&nbsp;\r\n\r\nThe degrees of freedom in a one-way <em>\u03c7<sup>2<\/sup><\/em>-test is <em>k<\/em>-1, where <em>k<\/em> is the number of categories\/groups. In this case we have <em>k<\/em>=2, so <em>df<\/em>=1.\r\n\r\n&nbsp;\r\n\r\n<strong>With a\u00a0<em>\u03c7<sup>2\u00a0<\/sup>=3.45,<\/em><em> df<\/em>=1<\/strong>,<strong> and a <em>p<\/em>=0.06<\/strong>[footnote]You can check the significance of any <em>\u03c7<sup>2\u00a0<\/sup><\/em>with a convenient online calculator, like this one here: <a href=\"https:\/\/www.socscistatistics.com\/pvalues\/chidistribution.aspx\">https:\/\/www.socscistatistics.com\/pvalues\/chidistribution.aspx<\/a>.[\/footnote] (i.e., <em>p<\/em>&gt;0.05), <strong>we fail to reject the null hypothesis. At this time, we do<em> not<\/em> have enough evidence to conclude that the observed distribution of the students is unusual enough to suggest a pattern which is different than a random variation of a 50\/50 split. As such, this distribution is <em>not<\/em> statistically significant -- we cannot conclude that the students lean one way or the other in their opinion about the campus cafeteria.<\/strong>\r\n\r\n&nbsp;\r\n\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nCalculating a two-way\u00a0<em>\u03c7<sup>2\u00a0<\/sup><\/em>-- by far the more often used one as it tests associations between <em>two<\/em> variables -- is just as easy, even if it involves calculating more numbers (since in the bivariate case we have more cells; four at the minimum, given a 2x2 cross-tabulation). The next section is devoted to that.\r\n\r\n&nbsp;","rendered":"<p>As in the previous section, here you need to recall how we examine potential association between two variables both treated as discrete (Section 7.2.2, <a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/7-2-2-between-two-discrete-variables\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/7-2-2-between-two-discrete-variables\/<\/a>). We described such associations through contingency tables, reporting differences of proportions as appropriate.<\/p>\n<p>&nbsp;<\/p>\n<p>We can start with the simplest, binary case: when the discrete variables have two groups each. Then we compare the groups of interest (categories of one variable) on one of the categories of the other variable. (The example in Chapter 7 we used was to compare the percentage of first-year students who like the campus cafeteria to the percentage of second-year students who do.)<\/p>\n<p>&nbsp;<\/p>\n<p><strong>The <em>t<\/em>-test for testing difference of <em>two<\/em> proportions.<\/strong> When we have only two proportions (or percentages) to compare, we can actually use the same <em>t<\/em>-test we used for testing differences of means, again treating the<em> difference<\/em> as a single, normally distributed statistic. Since we have categorical variables, however, and no standard deviations\/variances, we resort to measuring population variability by \u03c0(1-\u03c0) and sample variability by\u00a0<em>p<\/em>(1-<em>p<\/em>)<a class=\"footnote\" title=\"Do not forget that p here stands for proportion, not probability\/p-value.\" id=\"return-footnote-126-1\" href=\"#footnote-126-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a>. (See Section 6.7.2, <a href=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/6-7-2-confidence-intervals-for-proportions\/\">https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/6-7-2-confidence-intervals-for-proportions\/<\/a>.) We can thus simply substitute that into the formula for <em>z<\/em>:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-60591dcc35846d5c6688e8f02c6c6190_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#122;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#112;&#95;&#49;&#32;&#45;&#112;&#95;&#50;&#41;&#45;&#40;&#92;&#112;&#105;&#95;&#49;&#32;&#45;&#92;&#112;&#105;&#95;&#50;&#32;&#41;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#105;&#95;&#49;&#40;&#49;&#45;&#92;&#112;&#105;&#95;&#49;&#41;&#125;&#123;&#78;&#95;&#49;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#105;&#95;&#50;&#40;&#49;&#45;&#92;&#112;&#105;&#95;&#50;&#41;&#125;&#123;&#78;&#95;&#50;&#125;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"49\" width=\"172\" style=\"vertical-align: -29px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>where, of course, under the null hypothesis <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-deea120d14ae8723fa82fea3d87744b8_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#40;&#92;&#112;&#105;&#95;&#49;&#32;&#45;&#92;&#112;&#105;&#95;&#50;&#32;&#41;&#61;&#48;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"103\" style=\"vertical-align: -4px;\" \/>. Then, using the sample proportions leaves us with <em>t<\/em>:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-46eba13fe03ffbad8ccf62631136ed59_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#116;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#112;&#95;&#49;&#32;&#45;&#112;&#95;&#50;&#41;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#112;&#95;&#49;&#40;&#49;&#45;&#112;&#95;&#49;&#41;&#125;&#123;&#78;&#95;&#49;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#112;&#95;&#50;&#40;&#49;&#45;&#112;&#95;&#50;&#41;&#125;&#123;&#78;&#95;&#50;&#125;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"49\" width=\"166\" style=\"vertical-align: -29px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Again, under the null hypothesis the two groups&#8217; proportions are assumed to be the same so effectively we have:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-e578a47b41b8473d37137cee31219405_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#116;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#112;&#95;&#49;&#32;&#45;&#112;&#95;&#50;&#41;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#112;&#40;&#49;&#45;&#112;&#41;&#40;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#78;&#95;&#49;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#50;&#125;&#123;&#78;&#95;&#50;&#125;&#41;&#125;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"41\" width=\"150\" style=\"vertical-align: -21px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Let&#8217;s revisit the cafeteria-preferences example from Section 7.2.2 to see how the <em>t<\/em>-test for testing difference of proportions works.<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Example 9.3 Do You Like the Campus Cafeteria? (A t-Test)<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>&nbsp;<\/p>\n<p>In Chapter 7 we imagined that you asked 35 students in your class<a class=\"footnote\" title=\"Note that this of course is not a random sample; we are using it here only for illustrating how hypothesis testing works so we are effectively pretending it is random. In a real-life study, you should not use non-probability samples for statistical inference.\" id=\"return-footnote-126-2\" href=\"#footnote-126-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a> whether they liked the campus cafeteria: 12 of your classmates said <em>yes<\/em> (i.e., 34.3 percent), 7 (out of 15) first-years and 5 (out of 20) second-years (46.7 percent of all first-years and 25 percent of all second-years, respectively).<\/p>\n<p>&nbsp;<\/p>\n<p>We want to know whether the observed in the sample difference in proportions (0.467-0.25=0.217) is statistically significant: can it be generlized to a larger student population, or is it due to a regular sampling variability?<\/p>\n<ul>\n<li>H<sub>0<\/sub>: The proportion of first year students who like the cafeteria is the same as the proportion of second year students who do; <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-ef97578098e79c635a04c84a53de2596_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#112;&#105;&#95;&#49;&#61;&#92;&#112;&#105;&#95;&#50;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"58\" style=\"vertical-align: -4px;\" \/>.<\/li>\n<li>H<sub>a<\/sub>:\u00a0The proportion of first year students who like the cafeteria is different than the proportion of second year students who do; <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-31f4da5c4046578e149ca37283f87396_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#112;&#105;&#95;&#49;&#92;&#110;&#101;&#113;&#92;&#112;&#105;&#95;&#50;\" title=\"Rendered by QuickLaTeX.com\" height=\"17\" width=\"58\" style=\"vertical-align: -4px;\" \/>.<\/li>\n<\/ul>\n<p>Substituting these numbers in the formula we have:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-a5ed95f269821d207135eab60d3c7f2a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#116;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#112;&#95;&#49;&#32;&#45;&#112;&#95;&#50;&#41;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#112;&#40;&#49;&#45;&#112;&#41;&#40;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#78;&#95;&#49;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#50;&#125;&#123;&#78;&#95;&#50;&#125;&#41;&#125;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#48;&#46;&#52;&#54;&#55;&#45;&#48;&#46;&#50;&#53;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#48;&#46;&#51;&#52;&#51;&#40;&#49;&#45;&#51;&#52;&#51;&#41;&#40;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#49;&#53;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#50;&#48;&#125;&#41;&#125;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#48;&#46;&#50;&#49;&#55;&#125;&#123;&#48;&#46;&#49;&#54;&#50;&#125;&#61;&#49;&#46;&#51;&#52;\" title=\"Rendered by QuickLaTeX.com\" height=\"41\" width=\"440\" style=\"vertical-align: -21px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong>With a <em>t<\/em>=1.34, <em>df<\/em>=34, and <em>p<\/em>=0.189 (i.e., <em>p<\/em>&gt;0.05) we <em>fail<\/em> to reject the null hypothesis: at this point we do not have enough evidence to conclude there is a difference between the proportions of first and second year students who like the campus cafeteria. The 21.7 percentage points difference is not statistically significant, and has a high enough probability of being due to random chance<\/strong>.<\/p>\n<p>&nbsp;<\/p>\n<p>We can check this with a confidence interval too:<\/p>\n<ul>\n<li>95% CI: <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-2f0c457fdd41e129a06c3d0d8b1bc7e4_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#40;&#112;&#95;&#49;&#32;&#45;&#112;&#95;&#50;&#41;&#92;&#112;&#109;&#49;&#46;&#57;&#54;&#92;&#116;&#105;&#109;&#101;&#115;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#112;&#95;&#49;&#40;&#49;&#45;&#112;&#95;&#49;&#41;&#125;&#123;&#78;&#95;&#49;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#112;&#95;&#50;&#40;&#49;&#45;&#112;&#95;&#50;&#41;&#125;&#123;&#78;&#95;&#50;&#125;&#125;&#61;&#48;&#46;&#50;&#49;&#55;&#92;&#112;&#109;&#49;&#46;&#57;&#54;&#92;&#116;&#105;&#109;&#101;&#115;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#48;&#46;&#52;&#54;&#55;&#40;&#48;&#46;&#53;&#51;&#51;&#41;&#125;&#123;&#49;&#53;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#48;&#46;&#50;&#53;&#40;&#48;&#46;&#55;&#53;&#41;&#125;&#123;&#50;&#48;&#125;&#125;&#61;&#48;&#46;&#50;&#49;&#55;&#92;&#112;&#109;&#48;&#46;&#51;&#49;&#54;&#61;&#40;&#45;&#48;&#46;&#48;&#57;&#57;&#59;&#32;&#48;&#46;&#53;&#51;&#51;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"67\" width=\"580\" style=\"vertical-align: -10px;\" \/><\/li>\n<\/ul>\n<p><strong>In other words, the difference between the proportion of first years and the proportion of second years who like the cafeteria could be anywhere between -9.9 percentage points and 53.3 percentage points with 95% confidence (or 19 out of 20 such samples will have a difference within this pretty large interval).<\/strong> The difference can be in favour of second years or in favour of the first years (notice the negative lower bound); it can even be 0. Thus, <strong>since a difference of 0 (i.e., no difference) is a plausible value, we cannot reject the null hypothesis. We conclude that we do not have enough evidence of an association between year of study and opinion on the campus cafeteria.<\/strong><\/p>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Admittedly, the formulas look scary but if you have followed through the example above, you have seen by now the actual calculation is quite simple. You can try it out and see for yourself.<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Do It! 9.2 Vegetarianism\/Veganism among Canadian and International Students<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>&nbsp;<\/p>\n<p>Imagine you are interested in exploring whether there is a difference between Canadian and international students in your university when it comes to dietary preferences like vegetarianism and veganism. With your institution&#8217;s registrar&#8217;s assistance, you take a random sample of 100 students and poll them on 1) whether they are a Canadian or an international student, and 2) whether they are vegetarian\/vegan or not.<\/p>\n<p>&nbsp;<\/p>\n<p>You find that you have 70 Canadian and 30 international students in your sample. Out of the Canadian students, 15 (or 21.4 percent) are vegetarian or vegan; out of the international students 5 (or 16.7 percent) have such dietary restrictions.<\/p>\n<p>&nbsp;<\/p>\n<p>Check if the observed <em>in the sample<\/em> difference in proportions is generalizable to the larger student population by testing the hypothesis whether dietary preferences are associated with country of origin. Create a 95% confidence interval for that difference, and substantively interpret what you have found with both the <em>t<\/em>-test and the confidence interval.<\/p>\n<p>&nbsp;<\/p>\n<p><sub>Useful hint 1: Among the 100, there are 20 vegan\/vegetarian students in total.<\/sub><\/p>\n<p><sub>Useful hint 2: You can find the <em>p<\/em>-value of your <em>t<\/em>-statistic here: <a href=\"https:\/\/www.socscistatistics.com\/pvalues\/tdistribution.aspx\">https:\/\/www.socscistatistics.com\/pvalues\/tdistribution.aspx<\/a>.<\/sub><\/p>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Of course, discrete variables do not have to be binary: they can have more than two categories each. Just like in the case of a continuous and a discrete variables&#8217; association discussed in the previous section where non-binary variables required the use of an <em>F<\/em>-test, there is a different test for testing the association between any two discrete variables, regardless of their respective number of categories (i.e., not just binary ones).<\/p>\n<p>&nbsp;<\/p>\n<p><strong>The\u00a0<em>\u03c7<sup>2<\/sup><\/em>-test for testing associations between discrete variables.\u00a0<\/strong>The\u00a0<em>\u03c7<sup>2<\/sup><\/em>-test<a class=\"footnote\" title=\"This is the small-case Greek letter h, \u03c7. It is pronounced [KHAI], but since it is transliterated as chi, many people incorrectly pronounce it as [CHAI] or even [CHEE]. The test itself is called chi-squared test (again, pronounced as [KHAI- squared] not [CHAI- or CHEE-squared]).\" id=\"return-footnote-126-3\" href=\"#footnote-126-3\" aria-label=\"Footnote 3\"><sup class=\"footnote\">[3]<\/sup><\/a> (or Pearson&#8217;s\u00a0<em>\u03c7<sup>2<\/sup><\/em>-test) is based on <strong>a comparison between the\u00a0<em>observed<\/em> and the\u00a0<em>expected<\/em> cell values in a contingency table.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>The observed values are the cell counts you see in a contingency table given a specific dataset. The expected values, on the other hand, are the counts we would <em>expect<\/em> to see <em>if there were no pattern\/association in the data<\/em>. In other words, the test effectively compares the sample to a null-hypothesis-like hypothetical distribution of the observations across the cells. Thus, logically, <strong>if there is a relatively large difference between the observed and the expected values, we can take that as evidence <em>against<\/em> the null hypothesis and reject it. If, however, the difference between observed and expected values is relatively small, the evidence against the null hypothesis will be insufficient and we would <em>fail<\/em> to reject it.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>The actual way the\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>is calculated is this:<\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 43px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-03952b6dae1223ad2d9b628ea21010f3_l3.png\" height=\"43\" width=\"133\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#92;&#99;&#104;&#105;&#94;&#50;&#61;&#92;&#83;&#105;&#103;&#109;&#97;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#102;&#95;&#111;&#32;&#45;&#102;&#95;&#101;&#41;&#94;&#50;&#125;&#123;&#102;&#95;&#101;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>where <em>f<sub>o<\/sub><\/em> is the observed frequency (count) and <em>f<sub>e<\/sub><\/em> is the expected frequency count of a given cell.<\/p>\n<p>&nbsp;<\/p>\n<p>The formula looks more complicated than it is (don&#8217;t they always?) &#8212; it only asks us to calculate the difference between the observed and the expected count <em>for each cell<\/em>, square it and divide it by the expected count.\u00a0 Once we have done this for all cells, we need only add the resulting numbers together to get the\u00a0<em>\u03c7<sup>2<\/sup><\/em>\u00a0.<\/p>\n<p>&nbsp;<\/p>\n<p>Considering that the\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>is then a sum of as many numbers as there are cells, the larger the table (i.e., the more rows and columns there are), the bigger the resulting\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>will be. To account for that, the\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>too has degrees of freedom, where the <em>df<\/em>=(<em>rows<\/em>-1)(<em>columns<\/em>-1). The\u00a0<em>\u03c7<sup>2\u00a0<\/sup><\/em>follows a\u00a0<em>\u03c7<sup>2<\/sup>&#8211;<\/em>distribution, which too provides a <em>p<\/em>-value given specific <em>df<\/em>.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>The hypothesis testing then follows the same steps as the <em>t<\/em>-test and the <em>F<\/em>-test: obtain\u00a0<em>\u03c7<sup>2<\/sup><\/em>-value with specific <em>df,\u00a0<\/em>find its associated\u00a0<em>p<\/em>-value, and finally compare the <em>p<\/em>-value to the pre-selected significance level. If <em>p<\/em>&lt;<em>\u03b1<\/em>, reject the null hypothesis.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>To demonstrate, we will first do a <em>one-way<\/em>\u00a0<em>\u03c7<sup>2<\/sup><\/em><sup>\u00a0<\/sup>calculation, i.e., based on the frequency distribution of just <em>one<\/em> variable. (Of course, if tabulated, this would not be considered a contingency table but a frequency table.)<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Example 9.4 Do You Like The Campus Cafeteria? (Univariate \u03c7<sup>2<\/sup>-Test)<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>&nbsp;<\/p>\n<p>To use the imaginary data from before, we had 12 people who admitted liking the campus cafeteria food out of the 35 polled. (Since we are interested only in one of the variables, here we ignore whether the students who like the cafeteria are first- or second-years.) As such, we have the following table:<\/p>\n<p>&nbsp;<\/p>\n<p><em>Table 9.1\u00a0 Approval of the Campus Cafeteria, Observed Count (Univariate)\u00a0\u00a0<\/em><\/p>\n<table class=\"lines\" style=\"border-collapse: collapse;width: 50.2841%;height: 85px\">\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"width: 2.83286%;height: 15px\"><strong>Yes<\/strong><\/td>\n<td style=\"width: 2.83286%;height: 15px\">12<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 2.83286%;height: 15px\"><strong>No<\/strong><\/td>\n<td style=\"width: 2.83286%;height: 15px\">23<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 2.83286%;height: 15px\"><strong>Total<\/strong><\/td>\n<td style=\"width: 2.83286%;height: 15px\">35<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>If you did not know anything about the campus cafeteria and had no observations about it whatsoever &#8212; i.e., had you been an impartial observer, as it were &#8212; wouldn&#8217;t you expect to see an approximately 50\/50 split of the 35 students into the two categories? After all, there are only two groups, and an unbiased (random) distribution would be exactly like everyone flipping a coin as a manner of deciding in which group they end up. Thus, <strong>the expected count here is simply <em>N<\/em> divided by the number of groups\/categories<\/strong> (denoted by <em>k<\/em>):<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-48205f1fb1fe9e6e129c1bdfd86f1e13_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#95;&#101;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#78;&#125;&#123;&#107;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#53;&#125;&#123;&#50;&#125;&#61;&#49;&#55;&#46;&#53;\" title=\"Rendered by QuickLaTeX.com\" height=\"22\" width=\"152\" style=\"vertical-align: -6px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Table 9.2 adds the expected count in brackets next to the observed count.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><em>Table 9.2 Approval of the Campus Cafeteria, Observed and Expected<\/em>\u00a0<em>Count (Univariate)<\/em><\/p>\n<table class=\"lines\" style=\"border-collapse: collapse;width: 50.2841%;height: 85px\">\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"width: 2.83286%;height: 15px\"><strong>Yes<\/strong><\/td>\n<td style=\"width: 2.83286%;height: 15px\">12\u00a0 \u00a0 (17.5)<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 2.83286%;height: 15px\"><strong>No<\/strong><\/td>\n<td style=\"width: 2.83286%;height: 15px\">23\u00a0 \u00a0 (17.5)<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 2.83286%;height: 15px\"><strong>Total<\/strong><\/td>\n<td style=\"width: 2.83286%;height: 15px\">35<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>Then, according to the formula, this is what we have for each of the two groups:<\/p>\n<ul>\n<li><em>Yes<\/em>-group: <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-4c6819c8291f0a42603c1adbf2bd8a08_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#102;&#95;&#111;&#45;&#102;&#95;&#101;&#41;&#94;&#50;&#125;&#123;&#102;&#95;&#101;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#49;&#50;&#45;&#49;&#55;&#46;&#53;&#41;&#94;&#50;&#125;&#123;&#49;&#55;&#46;&#53;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#48;&#46;&#50;&#53;&#125;&#123;&#49;&#55;&#46;&#53;&#125;&#61;&#49;&#46;&#55;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"29\" width=\"265\" style=\"vertical-align: -9px;\" \/><\/li>\n<li><em>No<\/em>-group: <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-335e24d394f28c30c7f6949e0fe5ced6_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#102;&#95;&#111;&#45;&#102;&#95;&#101;&#41;&#94;&#50;&#125;&#123;&#102;&#95;&#101;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#50;&#51;&#45;&#49;&#55;&#46;&#53;&#41;&#94;&#50;&#125;&#123;&#49;&#55;&#46;&#53;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#48;&#46;&#50;&#53;&#125;&#123;&#49;&#55;&#46;&#53;&#125;&#61;&#49;&#46;&#55;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"29\" width=\"265\" style=\"vertical-align: -9px;\" \/><\/li>\n<\/ul>\n<p>Finally, to get the<em>\u00a0\u03c7<sup>2<\/sup> <\/em>we only need to add these two numbers together:<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-759b4a7638c84b7ef9906411725eb0c3_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#99;&#104;&#105;&#94;&#50;&#61;&#92;&#83;&#105;&#103;&#109;&#97;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#102;&#95;&#111;&#32;&#45;&#102;&#95;&#101;&#41;&#94;&#50;&#125;&#123;&#102;&#95;&#101;&#125;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#49;&#50;&#45;&#49;&#55;&#46;&#53;&#41;&#94;&#50;&#125;&#123;&#49;&#55;&#46;&#53;&#125;&#43;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#50;&#51;&#45;&#49;&#55;&#46;&#53;&#41;&#94;&#50;&#125;&#123;&#49;&#55;&#46;&#53;&#125;&#61;&#49;&#46;&#55;&#51;&#43;&#49;&#46;&#55;&#51;&#61;&#51;&#46;&#52;&#54;\" title=\"Rendered by QuickLaTeX.com\" height=\"29\" width=\"464\" style=\"vertical-align: -9px;\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The degrees of freedom in a one-way <em>\u03c7<sup>2<\/sup><\/em>-test is <em>k<\/em>-1, where <em>k<\/em> is the number of categories\/groups. In this case we have <em>k<\/em>=2, so <em>df<\/em>=1.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>With a\u00a0<em>\u03c7<sup>2\u00a0<\/sup>=3.45,<\/em><em> df<\/em>=1<\/strong>,<strong> and a <em>p<\/em>=0.06<\/strong><a class=\"footnote\" title=\"You can check the significance of any \u03c72\u00a0with a convenient online calculator, like this one here: https:\/\/www.socscistatistics.com\/pvalues\/chidistribution.aspx.\" id=\"return-footnote-126-4\" href=\"#footnote-126-4\" aria-label=\"Footnote 4\"><sup class=\"footnote\">[4]<\/sup><\/a> (i.e., <em>p<\/em>&gt;0.05), <strong>we fail to reject the null hypothesis. At this time, we do<em> not<\/em> have enough evidence to conclude that the observed distribution of the students is unusual enough to suggest a pattern which is different than a random variation of a 50\/50 split. As such, this distribution is <em>not<\/em> statistically significant &#8212; we cannot conclude that the students lean one way or the other in their opinion about the campus cafeteria.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Calculating a two-way\u00a0<em>\u03c7<sup>2\u00a0<\/sup><\/em>&#8212; by far the more often used one as it tests associations between <em>two<\/em> variables &#8212; is just as easy, even if it involves calculating more numbers (since in the bivariate case we have more cells; four at the minimum, given a 2&#215;2 cross-tabulation). The next section is devoted to that.<\/p>\n<p>&nbsp;<\/p>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-126-1\">Do not forget that <em>p<\/em> here stands for <em>proportion<\/em>, not <em>probability\/p<\/em>-<em>value<\/em>. <a href=\"#return-footnote-126-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-126-2\">Note that this of course is not a random sample; we are using it here only for illustrating how hypothesis testing works so we are effectively pretending it is random. In a real-life study, you should not use non-probability samples for statistical inference. <a href=\"#return-footnote-126-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><li id=\"footnote-126-3\">This is the small-case Greek letter <em>h<\/em>, <em>\u03c7<\/em>. <em>It is pronounced [KHAI]<\/em>, but since it is transliterated as <em>chi<\/em>, many people incorrectly pronounce it as [CHAI] or even [CHEE]. The test itself is called chi-squared test (again, pronounced as [KHAI- squared] not [CHAI- or CHEE-squared]). <a href=\"#return-footnote-126-3\" class=\"return-footnote\" aria-label=\"Return to footnote 3\">&crarr;<\/a><\/li><li id=\"footnote-126-4\">You can check the significance of any <em>\u03c7<sup>2\u00a0<\/sup><\/em>with a convenient online calculator, like this one here: <a href=\"https:\/\/www.socscistatistics.com\/pvalues\/chidistribution.aspx\">https:\/\/www.socscistatistics.com\/pvalues\/chidistribution.aspx<\/a>. <a href=\"#return-footnote-126-4\" class=\"return-footnote\" aria-label=\"Return to footnote 4\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":533,"menu_order":3,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-126","chapter","type-chapter","status-publish","hentry"],"part":120,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/126","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/users\/533"}],"version-history":[{"count":25,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/126\/revisions"}],"predecessor-version":[{"id":2127,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/126\/revisions\/2127"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/parts\/120"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/126\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/media?parent=126"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapter-type?post=126"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/contributor?post=126"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/license?post=126"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}