Processing math: 100%
"

Chi-Squared Test of Independence

Steps for Chi-Squared Test of Independence

Learning Objectives

Define the steps and formula required to perform a Chi-Squared Test for Indepedence

Let us now present the steps and formulas we will need to perform a Chi-Squared Test. See the section below ‘Another Explanation for the χ2 test’ to better understand the reason why we are using the formulas we are using and what they mean.

Null and Alternate Hypotheses

We are, again, performing a hypothesis test so we need to define our null and alternate hypotheses:

H0: The two categorical variables are independent (χ2=0)

HA: The two categorical variables are dependent (χ20)

Expected Value Formula

We calculate an expected value for each category for both populations/groups. This is the frequency we would expect if the two categorical variables are independent.

Expected Value=Row Total×Column TotalGrand Total

We read down the table to determine the column total and read across to determine the row total (the number of people/events in that category total). We then divide by the total number of people/events overall (‘Grand Total’).

χ2test Formula

We now take the difference between each expected value and the actual value for that category:

χ2test=(obsexp)2exp

χ2 is, essentially, a weighted average of the squared differences between the actual and expected frequencies. If it is much larger than zero, then the actual values are very different than the values we would expect if the two categorical variables were independent.

Degrees of Freedom and p-value Formula

Once we have determined the test statistic (χ2test), next, we should determine the associated p-value. Before we do that, we need to calculate the degrees of the freedom for the problem:

Degrees of Freedom=df=(#rows1)×(#columns1)

We now plug the test statistic and degrees of freedom into the CHISQ.DIST.RT(χ2test, df) Excel function:

p-value=CHISQ.DIST.RT(χ2test,df)

If the p-value returned is much less than the level of significance, we can easily say that the deviations between the observed and the expected counts are too large to be attributed to chance (there is a dependence between the categorical variables).

Decision

Just like all the other hypothesis tests we have performed, if the p-value returned is less than the level of significance, then we reject H0. If not, we fail to reject H0. Ie:

  • if p-value < L.O.S (Level of Significance): Reject H0
  • if p-value > L.O.S (Level of Significance): Do not reject H0

Conclusion

Again, like all of the other hypothesis tests in previous sections, if we reject H0, there is sufficient evidence to conclude (whatever it is we are trying to conclude). In this case, there will be sufficient evidence to conclude that the two categorical variables are dependent if we reject H0. Ie:

  • Reject H0: There is sufficient evidence to conclude that the two categorical variables are dependent.
  • Do not reject H0:  There is not sufficient evidence to conclude that the two categorical variables are dependent.

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

An Introduction to Business Statistics for Analytics (1st Edition) Copyright © 2024 by Amy Goldlist; Charles Chan; Leslie Major; Michael Johnson is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book