{"id":137,"date":"2018-10-31T18:13:57","date_gmt":"2018-10-31T22:13:57","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/simplestats\/?post_type=chapter&#038;p=137"},"modified":"2019-11-17T17:48:42","modified_gmt":"2019-11-17T22:48:42","slug":"10-2-4-r-squared","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/10-2-4-r-squared\/","title":{"raw":"10.2.4 R-squared","rendered":"10.2.4 R-squared"},"content":{"raw":"[latexpage]\r\n\r\nIn the previous section we established that the correlation coefficient <em>r<\/em> and the regression coefficient <em>b<\/em> are related:\r\n\r\n&nbsp;\r\n\r\n$$b=r\\frac{s_y}{s_x}$$\r\n\r\n&nbsp;\r\n\r\nAnd how could they not be: if a slope exists, a correlation exists. As such, the standard regression output provided by SPSS includes a <em>Model Summary<\/em> table that lists the Pearson's <em>r<\/em>. Table 10.6 below is the <em>Model Summary<\/em> table of the simulated-data class attendance\/final class scores regression.\r\n\r\n&nbsp;\r\n\r\n<em>Table 10.6 R and R<sup>2<\/sup> for Class Attendance and Final Class Scores<\/em>\r\n\r\n<img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-class-attendance-scores-full.png\" alt=\"\" width=\"395\" height=\"123\" class=\"wp-image-1385 size-full aligncenter\" \/>\r\n\r\n&nbsp;\r\n\r\nPearson's <em>r<\/em> (listed as <em>R<\/em>)\u00a0in this table is, of course, exactly the same as what the SPSS <em>Correlate<\/em> procedure provides. Squaring that number, however, provides us with a new and useful piece of information, sometimes called <strong>the <em>coefficient of determination<\/em>, but more often simply referred to as<em> R<sup>2<\/sup><\/em><\/strong>.\r\n\r\n&nbsp;\r\n\r\n$$r\\times r=R^2$$\r\n\r\n&nbsp;\r\n\r\n<strong style=\"font-size: 14pt;text-indent: 18.6667px\"><em>R<sup>2<\/sup><\/em>\u00a0provides a measure of the proportion of the variability in the dependent variable explained by the independent variable[<\/strong>footnote]Or, independent variable<strong>s<\/strong>, in the case of multivariate regression.[\/footnote] <strong>in the model.<\/strong>\r\n\r\n&nbsp;\r\n\r\n$$R^2=\\frac{explained~variation~of~y}{total~variation~of~y}$$\r\n\r\n&nbsp;\r\n\r\nRecall that regression's logic is based on minimizing residuals\/errors and about explaining the variation of the dependent variable through information about the independent variable. In a deterministic case, the dependent variable will depend entirely on the independent one, and then we would have a correlation of 1 and <em>R<sup>2<\/sup><\/em>=1. However, with uncertainty and estimation, this is not the case -- some variability of the dependent variable remains unexplained by the regression model (i.e., the independent variable).\r\n\r\n&nbsp;\r\n\r\nThus, one way to look at <em>R<sup>2<\/sup><\/em> is as an indication of <em>goodness of fit<\/em>: how close the observations are fitted around the regression line (i.e., how little variability is left unexplained). The larger <em>R<sup>2<\/sup> <\/em>then, the better -- as a large <em>R<sup>2<\/sup><\/em> would mean the model (the independent variable\/s) explains a large proportion of the variability in the dependent variable.\r\n\r\n&nbsp;\r\n\r\nAs you can see in Table 10.6 above, the <em>R<sup>2<\/sup><\/em> of the class attendance\/final test scores is:\r\n\r\n&nbsp;\r\n\r\n$$r\\times r=0.849^2=0.721=R^2$$\r\n\r\n&nbsp;\r\n\r\nOr, class attendance explains 72.1 percent of the variability in final test scores, which is a lot, and quite good regression fit[footnote]Of course, this also means that (100-72.1=) 27.9 percent of the variation in test scores is left unexplained by class attendance, i.e., is due to something else beyond class attendance.[\/footnote].\r\n\r\n&nbsp;\r\n\r\nCompare this to the <em>Model Summary<\/em> table of respondent's and father's years of schooling in Table 10.7 below.\r\n\r\n&nbsp;\r\n\r\n<em>Table 10.7 R and R<sup>2<\/sup> for Respondent's and Father's Years of Schooling<\/em>\r\n\r\n<img src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-educ-paeduc.png\" alt=\"\" width=\"392\" height=\"135\" class=\"wp-image-1394 size-full aligncenter\" \/>\r\n\r\n&nbsp;\r\n\r\nUnlike the very strong correlation of <em>r<\/em>=0.849, the moderately weak correlation coefficient <em>r<\/em>=0.413 is already an indication of not that great a fit. Thus, the\u00a0<em>R<sup>2<\/sup><\/em> of offspring and parental education is:\r\n\r\n&nbsp;\r\n\r\n$$r\\times r=0.413^2=0.170=R^2$$\r\n\r\n&nbsp;\r\n\r\nThat is, fathers' years of schooling explain only 17 percent of the variation of respondents' years of schooling. The biggest 'chunk' of the variation in schooling is left unexplained, i.e., there are other factors influencing how much education one is expected to have, on average. Regardless, we should not dismiss parental education outright -- it still has a statistically significant effect on offspring education (albeit not very strong).\r\n\r\n&nbsp;\r\n\r\n. . . Or does it? Recall our discussion on causality. The fact that two variables are <em>statistically<\/em> associated does not necessarily mean that one <em>causes<\/em> the other to change (or, that it explains the other's variability). Working with\u00a0<span style=\"text-indent: 37.3333px;font-size: 14pt\">only\u00a0<\/span><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">two variables prevents us from accounting for alternative explanations -- i.e., of taking into account other factors, other variables, other effects. Luckily, regression has our backs. I leave you with how that happens in the next -- and\u00a0<\/span><em style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">final!<\/em><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\"> -- section of this textbook.<\/span>","rendered":"<p>In the previous section we established that the correlation coefficient <em>r<\/em> and the regression coefficient <em>b<\/em> are related:<\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 35px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-59f8ecf2d9a4a7a60cc3ddd80b4ed0d3_l3.png\" height=\"35\" width=\"58\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#98;&#61;&#114;&#92;&#102;&#114;&#97;&#99;&#123;&#115;&#95;&#121;&#125;&#123;&#115;&#95;&#120;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>And how could they not be: if a slope exists, a correlation exists. As such, the standard regression output provided by SPSS includes a <em>Model Summary<\/em> table that lists the Pearson&#8217;s <em>r<\/em>. Table 10.6 below is the <em>Model Summary<\/em> table of the simulated-data class attendance\/final class scores regression.<\/p>\n<p>&nbsp;<\/p>\n<p><em>Table 10.6 R and R<sup>2<\/sup> for Class Attendance and Final Class Scores<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-class-attendance-scores-full.png\" alt=\"\" width=\"395\" height=\"123\" class=\"wp-image-1385 size-full aligncenter\" srcset=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-class-attendance-scores-full.png 395w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-class-attendance-scores-full-300x93.png 300w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-class-attendance-scores-full-65x20.png 65w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-class-attendance-scores-full-225x70.png 225w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-class-attendance-scores-full-350x109.png 350w\" sizes=\"auto, (max-width: 395px) 100vw, 395px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Pearson&#8217;s <em>r<\/em> (listed as <em>R<\/em>)\u00a0in this table is, of course, exactly the same as what the SPSS <em>Correlate<\/em> procedure provides. Squaring that number, however, provides us with a new and useful piece of information, sometimes called <strong>the <em>coefficient of determination<\/em>, but more often simply referred to as<em> R<sup>2<\/sup><\/em><\/strong>.<\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 17px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-1e73c07395ed96db39b253df2917679b_l3.png\" height=\"17\" width=\"83\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#114;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#114;&#61;&#82;&#94;&#50;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong style=\"font-size: 14pt;text-indent: 18.6667px\"><em>R<sup>2<\/sup><\/em>\u00a0provides a measure of the proportion of the variability in the dependent variable explained by the independent variable[<\/strong>footnote]Or, independent variable<strong>s<\/strong>, in the case of multivariate regression.[\/footnote] <strong>in the model.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 41px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-3f1123f92a5cb5473f1c22012752dcbb_l3.png\" height=\"41\" width=\"245\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#82;&#94;&#50;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#101;&#120;&#112;&#108;&#97;&#105;&#110;&#101;&#100;&#126;&#118;&#97;&#114;&#105;&#97;&#116;&#105;&#111;&#110;&#126;&#111;&#102;&#126;&#121;&#125;&#123;&#116;&#111;&#116;&#97;&#108;&#126;&#118;&#97;&#114;&#105;&#97;&#116;&#105;&#111;&#110;&#126;&#111;&#102;&#126;&#121;&#125;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Recall that regression&#8217;s logic is based on minimizing residuals\/errors and about explaining the variation of the dependent variable through information about the independent variable. In a deterministic case, the dependent variable will depend entirely on the independent one, and then we would have a correlation of 1 and <em>R<sup>2<\/sup><\/em>=1. However, with uncertainty and estimation, this is not the case &#8212; some variability of the dependent variable remains unexplained by the regression model (i.e., the independent variable).<\/p>\n<p>&nbsp;<\/p>\n<p>Thus, one way to look at <em>R<sup>2<\/sup><\/em> is as an indication of <em>goodness of fit<\/em>: how close the observations are fitted around the regression line (i.e., how little variability is left unexplained). The larger <em>R<sup>2<\/sup> <\/em>then, the better &#8212; as a large <em>R<sup>2<\/sup><\/em> would mean the model (the independent variable\/s) explains a large proportion of the variability in the dependent variable.<\/p>\n<p>&nbsp;<\/p>\n<p>As you can see in Table 10.6 above, the <em>R<sup>2<\/sup><\/em> of the class attendance\/final test scores is:<\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 18px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-2751edee5ceadd8ea821b454c42eed81_l3.png\" height=\"18\" width=\"219\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#114;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#114;&#61;&#48;&#46;&#56;&#52;&#57;&#94;&#50;&#61;&#48;&#46;&#55;&#50;&#49;&#61;&#82;&#94;&#50;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Or, class attendance explains 72.1 percent of the variability in final test scores, which is a lot, and quite good regression fit<a class=\"footnote\" title=\"Of course, this also means that (100-72.1=) 27.9 percent of the variation in test scores is left unexplained by class attendance, i.e., is due to something else beyond class attendance.\" id=\"return-footnote-137-1\" href=\"#footnote-137-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p>Compare this to the <em>Model Summary<\/em> table of respondent&#8217;s and father&#8217;s years of schooling in Table 10.7 below.<\/p>\n<p>&nbsp;<\/p>\n<p><em>Table 10.7 R and R<sup>2<\/sup> for Respondent&#8217;s and Father&#8217;s Years of Schooling<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-educ-paeduc.png\" alt=\"\" width=\"392\" height=\"135\" class=\"wp-image-1394 size-full aligncenter\" srcset=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-educ-paeduc.png 392w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-educ-paeduc-300x103.png 300w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-educ-paeduc-65x22.png 65w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-educ-paeduc-225x77.png 225w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/05\/r2-educ-paeduc-350x121.png 350w\" sizes=\"auto, (max-width: 392px) 100vw, 392px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Unlike the very strong correlation of <em>r<\/em>=0.849, the moderately weak correlation coefficient <em>r<\/em>=0.413 is already an indication of not that great a fit. Thus, the\u00a0<em>R<sup>2<\/sup><\/em> of offspring and parental education is:<\/p>\n<p>&nbsp;<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 18px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/ql-cache\/quicklatex.com-c2a1d43bdedf4fe7d56039b55dc4709f_l3.png\" height=\"18\" width=\"219\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#091;&#114;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#114;&#61;&#48;&#46;&#52;&#49;&#51;&#94;&#50;&#61;&#48;&#46;&#49;&#55;&#48;&#61;&#82;&#94;&#50;&#92;&#093;\" title=\"Rendered by QuickLaTeX.com\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>That is, fathers&#8217; years of schooling explain only 17 percent of the variation of respondents&#8217; years of schooling. The biggest &#8216;chunk&#8217; of the variation in schooling is left unexplained, i.e., there are other factors influencing how much education one is expected to have, on average. Regardless, we should not dismiss parental education outright &#8212; it still has a statistically significant effect on offspring education (albeit not very strong).<\/p>\n<p>&nbsp;<\/p>\n<p>. . . Or does it? Recall our discussion on causality. The fact that two variables are <em>statistically<\/em> associated does not necessarily mean that one <em>causes<\/em> the other to change (or, that it explains the other&#8217;s variability). Working with\u00a0<span style=\"text-indent: 37.3333px;font-size: 14pt\">only\u00a0<\/span><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">two variables prevents us from accounting for alternative explanations &#8212; i.e., of taking into account other factors, other variables, other effects. Luckily, regression has our backs. I leave you with how that happens in the next &#8212; and\u00a0<\/span><em style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">final!<\/em><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\"> &#8212; section of this textbook.<\/span><\/p>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-137-1\">Of course, this also means that (100-72.1=) 27.9 percent of the variation in test scores is left unexplained by class attendance, i.e., is due to something else beyond class attendance. <a href=\"#return-footnote-137-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":533,"menu_order":6,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-137","chapter","type-chapter","status-publish","hentry"],"part":128,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/137","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/users\/533"}],"version-history":[{"count":16,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/137\/revisions"}],"predecessor-version":[{"id":2165,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/137\/revisions\/2165"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/parts\/128"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/137\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/media?parent=137"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapter-type?post=137"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/contributor?post=137"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/license?post=137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}