{"id":130,"date":"2023-05-05T12:48:46","date_gmt":"2023-05-05T16:48:46","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/chapter\/measures-of-association\/"},"modified":"2024-01-12T18:34:17","modified_gmt":"2024-01-12T23:34:17","slug":"measures-of-association","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/chapter\/measures-of-association\/","title":{"raw":"Measures of Association","rendered":"Measures of Association"},"content":{"raw":"<div class=\"measures-of-association-\">\r\n<h1>Some Statistics that you may need:<\/h1>\r\n<h2>Correlation<\/h2>\r\n<p class=\"import-Normal\">A [pb_glossary id=\"239\"]correlation[\/pb_glossary] exists between two variables when one of them is related to the other in some way.<\/p>\r\n<p class=\"import-NormalWeb\">A scatter plot is a graph in which the paired [latex](x,y) [\/latex] sample data are plotted with a horizontal [latex]x[\/latex]-axis and a vertical [latex]y[\/latex]-axis.<\/p>\r\n<p class=\"import-NormalWeb\">Linear Correlation means our plot looks like a line<\/p>\r\nIMAGE\r\n<p class=\"import-NormalWeb\">The <strong>Linear <\/strong><strong>C<\/strong><strong>orrelation <\/strong><strong>C<\/strong><strong>oefficient<\/strong> or <strong>Pearson <\/strong><strong>P<\/strong><strong>roduct <\/strong><strong>M<\/strong><strong>oment <\/strong><strong>C<\/strong><strong>orrelation <\/strong><strong>C<\/strong><strong>oefficient<\/strong> is a way to look at the variances of our data and come up with [latex]r[\/latex], a number which tells us how strong the correlation is.<\/p>\r\n\\[r = \\frac{1}{n-1}\\Sigma \\left( \\frac{x-\\bar{x}}{s_x}\\right)\\left( \\frac{y-\\bar{y}}{s_y}\\right)\\]\r\n\r\n<\/div>\r\n&nbsp;\r\n\r\n\\[r = \\frac{1}{n-1}\\Sigma \\left( \\frac{x-\\bar{x}}{s_x}\\right)\\left( \\frac{y-\\bar{y}}{s_y}\\right)\\]\r\n<div class=\"measures-of-association-\">\r\n<p class=\"import-NormalWeb\">where the sum \u2211 is over all ordered pairs <em>(<\/em><em>x,y<\/em><em>), <\/em><em>s<\/em><sub><em>x<\/em><\/sub> is the standard deviation of the <em>x<\/em> values, <em>s<\/em><sub><em>y<\/em><\/sub> is the standard deviation of the y values, and<\/p>\r\n\r\n<div><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[\\bar{x} = E(x) = \\frac{\\Sigma x}{n}\\]\r\n<\/span><\/div>\r\nand\r\n<div><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[\\bar{y} = E(x) = \\frac{\\Sigma y}{n}\\]<\/span><\/div>\r\nare the sample means of <em>x<\/em> and <em>y<\/em> respectively.\r\n<table>\r\n<tbody>\r\n<tr class=\"TableGrid-R\">\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-NormalWeb\">Strong Positive<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-NormalWeb\">Weak Positive<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-NormalWeb\">Weak Negative<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-NormalWeb\">Strong Negative<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\">\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-NormalWeb\">\r\n[latex]r = 0.9[\/latex]<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-NormalWeb\"><img src=\"#fixme\" alt=\"image\" width=\"85.4px\" height=\"33.8666666666667px\" \/>\r\n[latex]r =0.3[\/latex]<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-NormalWeb\"><img src=\"#fixme\" alt=\"image\" width=\"61.059842519685px\" height=\"24.1133858267717px\" \/>\r\n[latex]r = -0.3[\/latex]<\/p>\r\n<\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-NormalWeb\"><img src=\"#fixme\" alt=\"image\" width=\"57.3553805774278px\" height=\"24.9391076115486px\" \/>\r\n[latex]r = -0.9[\/latex]<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"TableGrid-R\">\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\"><\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\"><\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\"><\/td>\r\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\r\n<p class=\"import-NormalWeb\"><\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[caption id=\"\" align=\"alignnone\" width=\"567\"]<img src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/05\/image83.png\" alt=\"image of r-square\" width=\"567\" height=\"239\" \/> Some values of r[\/caption]\r\n<h2>Covariance<\/h2>\r\n<p class=\"import-Normal\">Covariance is another way of measuring correlation, but can also look at some non-linear relationships. It is defined by:<\/p>\r\n<p class=\"import-Normal\"><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[cov(X, Y) = \\frac{1}{n}\\Sigma (x_i-\\bar{x})(y_i-\\bar{y})\\]\r\n<\/span><\/p>\r\n<p class=\"import-NormalWeb\">where the sum \u2211 is over all ordered pairs <em>(<\/em><em>x,y<\/em><em>),<\/em><\/p>\r\n&nbsp;\r\n<p class=\"import-NormalWeb\">Fun fact! When you take the same set of data twice, you get the following identities:<\/p>\r\n<p class=\"import-NormalWeb\" style=\"text-align: center\"><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[cov(X, X)=\\sigma^2\\]<\/span><\/p>\r\n<p class=\"import-NormalWeb\" style=\"text-align: center\"><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[r=corr(X,X)=1\\]<\/span><\/p>\r\n<p class=\"import-Normal\"><img src=\"#fixme\" alt=\"image\" width=\"270.582152230971px\" height=\"37.9433070866142px\" \/>\r\n<img src=\"rId175#fixme\" alt=\"image\" width=\"356.8px\" height=\"213.634960629921px\" \/><strong>Correlation vs. Causation<\/strong><\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"430\"]<img src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/05\/image86.jpg\" alt=\"image\" width=\"430\" height=\"343\" \/> This graph really is begging me to give a citation![\/caption]\r\n<p class=\"import-Normal\">In the following chart, we can see a clear correlation between the number of people who drowned by falling in a swimming pool in the USA and number of films that Nicholas Cage appeared in in that year.<\/p>\r\n<p class=\"import-Normal\">CAUSATION: Do you think Nicholas Cage causes drowning?<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"459\"]<img src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/05\/image87.png\" alt=\"image from XKCD \" width=\"459\" height=\"185\" \/> This is from XKCD, https:\/\/xkcd.com\/552\/[\/caption]\r\n<p class=\"import-Normal\">Or: Does smoking cause lung cancer? It\u2019s harder than you think to prove.<\/p>\r\n<p class=\"import-Normal\">Remember\u2026.. CORRELATION does not imply CAUSATION<\/p>\r\n<p class=\"import-Normal\"><img src=\"#fixme\" alt=\"image\" width=\"82.0913385826772px\" height=\"15.4666666666667px\" \/>\r\n<img src=\"#fixme\" alt=\"image\" width=\"59.2099737532808px\" height=\"23.8px\" \/><\/p>\r\n\r\n<div><a href=\"#sdfootnote1anc\">1<\/a> Measures of Centre, Variation and other measures are adapted from \u201cEssentials of Business Statistics\u201d; Black, Goldlist, Edmunds, Castillo; Wiley Publishing 2018.<\/div>\r\n<\/div>","rendered":"<div class=\"measures-of-association-\">\n<h1>Some Statistics that you may need:<\/h1>\n<h2>Correlation<\/h2>\n<p class=\"import-Normal\">A <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_130_239\">correlation<\/a> exists between two variables when one of them is related to the other in some way.<\/p>\n<p class=\"import-NormalWeb\">A scatter plot is a graph in which the paired [latex](x,y)[\/latex] sample data are plotted with a horizontal [latex]x[\/latex]-axis and a vertical [latex]y[\/latex]-axis.<\/p>\n<p class=\"import-NormalWeb\">Linear Correlation means our plot looks like a line<\/p>\n<p>IMAGE<\/p>\n<p class=\"import-NormalWeb\">The <strong>Linear <\/strong><strong>C<\/strong><strong>orrelation <\/strong><strong>C<\/strong><strong>oefficient<\/strong> or <strong>Pearson <\/strong><strong>P<\/strong><strong>roduct <\/strong><strong>M<\/strong><strong>oment <\/strong><strong>C<\/strong><strong>orrelation <\/strong><strong>C<\/strong><strong>oefficient<\/strong> is a way to look at the variances of our data and come up with [latex]r[\/latex], a number which tells us how strong the correlation is.<\/p>\n<p>\\[r = \\frac{1}{n-1}\\Sigma \\left( \\frac{x-\\bar{x}}{s_x}\\right)\\left( \\frac{y-\\bar{y}}{s_y}\\right)\\]<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p>\\[r = \\frac{1}{n-1}\\Sigma \\left( \\frac{x-\\bar{x}}{s_x}\\right)\\left( \\frac{y-\\bar{y}}{s_y}\\right)\\]<\/p>\n<div class=\"measures-of-association-\">\n<p class=\"import-NormalWeb\">where the sum \u2211 is over all ordered pairs <em>(<\/em><em>x,y<\/em><em>), <\/em><em>s<\/em><sub><em>x<\/em><\/sub> is the standard deviation of the <em>x<\/em> values, <em>s<\/em><sub><em>y<\/em><\/sub> is the standard deviation of the y values, and<\/p>\n<div><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[\\bar{x} = E(x) = \\frac{\\Sigma x}{n}\\]<br \/>\n<\/span><\/div>\n<p>and<\/p>\n<div><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[\\bar{y} = E(x) = \\frac{\\Sigma y}{n}\\]<\/span><\/div>\n<p>are the sample means of <em>x<\/em> and <em>y<\/em> respectively.<\/p>\n<table>\n<tbody>\n<tr class=\"TableGrid-R\">\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-NormalWeb\">Strong Positive<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-NormalWeb\">Weak Positive<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-NormalWeb\">Weak Negative<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-NormalWeb\">Strong Negative<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\">\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-NormalWeb\">\n[latex]r = 0.9[\/latex]<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-NormalWeb\"><img decoding=\"async\" src=\"#fixme\" alt=\"image\" width=\"85.4px\" height=\"33.8666666666667px\" \/><br \/>\n[latex]r =0.3[\/latex]<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-NormalWeb\"><img decoding=\"async\" src=\"#fixme\" alt=\"image\" width=\"61.059842519685px\" height=\"24.1133858267717px\" \/><br \/>\n[latex]r = -0.3[\/latex]<\/p>\n<\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-NormalWeb\"><img decoding=\"async\" src=\"#fixme\" alt=\"image\" width=\"57.3553805774278px\" height=\"24.9391076115486px\" \/><br \/>\n[latex]r = -0.9[\/latex]<\/p>\n<\/td>\n<\/tr>\n<tr class=\"TableGrid-R\">\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\"><\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\"><\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\"><\/td>\n<td class=\"TableGrid-C\" style=\"border: solid windowtext 0.5pt\">\n<p class=\"import-NormalWeb\">\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<figure style=\"width: 567px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/05\/image83.png\" alt=\"image of r-square\" width=\"567\" height=\"239\" \/><figcaption class=\"wp-caption-text\">Some values of r<\/figcaption><\/figure>\n<h2>Covariance<\/h2>\n<p class=\"import-Normal\">Covariance is another way of measuring correlation, but can also look at some non-linear relationships. It is defined by:<\/p>\n<p class=\"import-Normal\"><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[cov(X, Y) = \\frac{1}{n}\\Sigma (x_i-\\bar{x})(y_i-\\bar{y})\\]<br \/>\n<\/span><\/p>\n<p class=\"import-NormalWeb\">where the sum \u2211 is over all ordered pairs <em>(<\/em><em>x,y<\/em><em>),<\/em><\/p>\n<p>&nbsp;<\/p>\n<p class=\"import-NormalWeb\">Fun fact! When you take the same set of data twice, you get the following identities:<\/p>\n<p class=\"import-NormalWeb\" style=\"text-align: center\"><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[cov(X, X)=\\sigma^2\\]<\/span><\/p>\n<p class=\"import-NormalWeb\" style=\"text-align: center\"><span style=\"font-size: NaNpt;color: #;text-decoration: none\">\\[r=corr(X,X)=1\\]<\/span><\/p>\n<p class=\"import-Normal\"><img decoding=\"async\" src=\"#fixme\" alt=\"image\" width=\"270.582152230971px\" height=\"37.9433070866142px\" \/><br \/>\n<img decoding=\"async\" src=\"rId175#fixme\" alt=\"image\" width=\"356.8px\" height=\"213.634960629921px\" \/><strong>Correlation vs. Causation<\/strong><\/p>\n<figure style=\"width: 430px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/05\/image86.jpg\" alt=\"image\" width=\"430\" height=\"343\" \/><figcaption class=\"wp-caption-text\">This graph really is begging me to give a citation!<\/figcaption><\/figure>\n<p class=\"import-Normal\">In the following chart, we can see a clear correlation between the number of people who drowned by falling in a swimming pool in the USA and number of films that Nicholas Cage appeared in in that year.<\/p>\n<p class=\"import-Normal\">CAUSATION: Do you think Nicholas Cage causes drowning?<\/p>\n<figure style=\"width: 459px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/05\/image87.png\" alt=\"image from XKCD\" width=\"459\" height=\"185\" \/><figcaption class=\"wp-caption-text\">This is from XKCD, https:\/\/xkcd.com\/552\/<\/figcaption><\/figure>\n<p class=\"import-Normal\">Or: Does smoking cause lung cancer? It\u2019s harder than you think to prove.<\/p>\n<p class=\"import-Normal\">Remember\u2026.. CORRELATION does not imply CAUSATION<\/p>\n<p class=\"import-Normal\"><img decoding=\"async\" src=\"#fixme\" alt=\"image\" width=\"82.0913385826772px\" height=\"15.4666666666667px\" \/><br \/>\n<img decoding=\"async\" src=\"#fixme\" alt=\"image\" width=\"59.2099737532808px\" height=\"23.8px\" \/><\/p>\n<div><a href=\"#sdfootnote1anc\">1<\/a> Measures of Centre, Variation and other measures are adapted from \u201cEssentials of Business Statistics\u201d; Black, Goldlist, Edmunds, Castillo; Wiley Publishing 2018.<\/div>\n<\/div>\n<div class=\"glossary\"><span class=\"screen-reader-text\" id=\"definition\">definition<\/span><template id=\"term_130_239\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_130_239\"><div tabindex=\"-1\"><p>The term for a relationship between two variables<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><\/div>","protected":false},"author":883,"menu_order":2,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-130","chapter","type-chapter","status-publish","hentry"],"part":102,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters\/130","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/users\/883"}],"version-history":[{"count":23,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters\/130\/revisions"}],"predecessor-version":[{"id":379,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters\/130\/revisions\/379"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/parts\/102"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters\/130\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/media?parent=130"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapter-type?post=130"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/contributor?post=130"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/license?post=130"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}