15 Measures of Association
Some Statistics that you may need:
Correlation
A correlation exists between two variables when one of them is related to the other in some way.
A scatter plot is a graph in which the paired (x,y) sample data are plotted with a horizontal x-axis and a vertical y-axis.
Linear Correlation means our plot looks like a line
IMAGE
The Linear Correlation Coefficient or Pearson Product Moment Correlation Coefficient is a way to look at the variances of our data and come up with r, a number which tells us how strong the correlation is.
r=1n−1Σ(x−ˉxsx)(y−ˉysy)
r=1n−1Σ(x−ˉxsx)(y−ˉysy)
where the sum ∑ is over all ordered pairs (x,y), sx is the standard deviation of the x values, sy is the standard deviation of the y values, and
and
are the sample means of x and y respectively.
Strong Positive |
Weak Positive |
Weak Negative |
Strong Negative |
r=0.9 |
|
|
|
|

Covariance
Covariance is another way of measuring correlation, but can also look at some non-linear relationships. It is defined by:
cov(X,Y)=1nΣ(xi−ˉx)(yi−ˉy)
where the sum ∑ is over all ordered pairs (x,y),
Fun fact! When you take the same set of data twice, you get the following identities:
cov(X,X)=σ2
r=corr(X,X)=1
Correlation vs. Causation

In the following chart, we can see a clear correlation between the number of people who drowned by falling in a swimming pool in the USA and number of films that Nicholas Cage appeared in in that year.
CAUSATION: Do you think Nicholas Cage causes drowning?

Or: Does smoking cause lung cancer? It’s harder than you think to prove.
Remember….. CORRELATION does not imply CAUSATION
The term for a relationship between two variables