15 Measures of Association

Some Statistics that you may need:

Correlation

A correlation exists between two variables when one of them is related to the other in some way.

A scatter plot is a graph in which the paired [latex](x,y)[/latex] sample data are plotted with a horizontal [latex]x[/latex]-axis and a vertical [latex]y[/latex]-axis.

Linear Correlation means our plot looks like a line

IMAGE

The Linear Correlation Coefficient or Pearson Product Moment Correlation Coefficient is a way to look at the variances of our data and come up with [latex]r[/latex], a number which tells us how strong the correlation is.

\[r = \frac{1}{n-1}\Sigma \left( \frac{x-\bar{x}}{s_x}\right)\left( \frac{y-\bar{y}}{s_y}\right)\]

 

\[r = \frac{1}{n-1}\Sigma \left( \frac{x-\bar{x}}{s_x}\right)\left( \frac{y-\bar{y}}{s_y}\right)\]

where the sum ∑ is over all ordered pairs (x,y), sx is the standard deviation of the x values, sy is the standard deviation of the y values, and

\[\bar{x} = E(x) = \frac{\Sigma x}{n}\]

and

\[\bar{y} = E(x) = \frac{\Sigma y}{n}\]

are the sample means of x and y respectively.

Strong Positive

Weak Positive

Weak Negative

Strong Negative

[latex]r = 0.9[/latex]

image
[latex]r =0.3[/latex]

image
[latex]r = -0.3[/latex]

image
[latex]r = -0.9[/latex]

image of r-square
Some values of r

Covariance

Covariance is another way of measuring correlation, but can also look at some non-linear relationships. It is defined by:

\[cov(X, Y) = \frac{1}{n}\Sigma (x_i-\bar{x})(y_i-\bar{y})\]

where the sum ∑ is over all ordered pairs (x,y),

 

Fun fact! When you take the same set of data twice, you get the following identities:

\[cov(X, X)=\sigma^2\]

\[r=corr(X,X)=1\]

image
imageCorrelation vs. Causation

image
This graph really is begging me to give a citation!

In the following chart, we can see a clear correlation between the number of people who drowned by falling in a swimming pool in the USA and number of films that Nicholas Cage appeared in in that year.

CAUSATION: Do you think Nicholas Cage causes drowning?

image from XKCD
This is from XKCD, https://xkcd.com/552/

Or: Does smoking cause lung cancer? It’s harder than you think to prove.

Remember….. CORRELATION does not imply CAUSATION

image
image

1 Measures of Centre, Variation and other measures are adapted from “Essentials of Business Statistics”; Black, Goldlist, Edmunds, Castillo; Wiley Publishing 2018.
definition

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Business Analytics Copyright © by Amy Goldlist is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book