Processing math: 100%
"

15 Measures of Association

Some Statistics that you may need:

Correlation

A correlation exists between two variables when one of them is related to the other in some way.

A scatter plot is a graph in which the paired (x,y) sample data are plotted with a horizontal x-axis and a vertical y-axis.

Linear Correlation means our plot looks like a line

IMAGE

The Linear Correlation Coefficient or Pearson Product Moment Correlation Coefficient is a way to look at the variances of our data and come up with r, a number which tells us how strong the correlation is.

r=1n1Σ(xˉxsx)(yˉysy)

 

r=1n1Σ(xˉxsx)(yˉysy)

where the sum ∑ is over all ordered pairs (x,y), sx is the standard deviation of the x values, sy is the standard deviation of the y values, and

ˉx=E(x)=Σxn

and

ˉy=E(x)=Σyn

are the sample means of x and y respectively.

Strong Positive

Weak Positive

Weak Negative

Strong Negative

r=0.9

image
r=0.3

image
r=0.3

image
r=0.9

image of r-square
Some values of r

Covariance

Covariance is another way of measuring correlation, but can also look at some non-linear relationships. It is defined by:

cov(X,Y)=1nΣ(xiˉx)(yiˉy)

where the sum ∑ is over all ordered pairs (x,y),

 

Fun fact! When you take the same set of data twice, you get the following identities:

cov(X,X)=σ2

r=corr(X,X)=1

image
imageCorrelation vs. Causation

image
This graph really is begging me to give a citation!

In the following chart, we can see a clear correlation between the number of people who drowned by falling in a swimming pool in the USA and number of films that Nicholas Cage appeared in in that year.

CAUSATION: Do you think Nicholas Cage causes drowning?

image from XKCD
This is from XKCD, https://xkcd.com/552/

Or: Does smoking cause lung cancer? It’s harder than you think to prove.

Remember….. CORRELATION does not imply CAUSATION

image
image

1 Measures of Centre, Variation and other measures are adapted from “Essentials of Business Statistics”; Black, Goldlist, Edmunds, Castillo; Wiley Publishing 2018.
definition

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Business Analytics Copyright © by Amy Goldlist is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book