Steps for Chi-Squared Test of Independence

Leslie Major; Amy Goldlist

Chi-Squared Test of Independence

Steps for Chi-Squared Test of Independence

Learning Objectives

Define the steps and formula required to perform a Chi-Squared Test for Indepedence

Let us now present the steps and formulas we will need to perform a Chi-Squared Test. See the section below ‘Another Explanation for the $χ^{2}$ test’ to better understand the reason why we are using the formulas we are using and what they mean.

Null and Alternate Hypotheses

We are, again, performing a hypothesis test so we need to define our null and alternate hypotheses:

H₀: The two categorical variables are independent ( $χ^{2} = 0$ )

H_A: The two categorical variables are dependent ( $χ^{2} \neq 0$ )

Expected Value Formula

We calculate an expected value for each category for both populations/groups. This is the frequency we would expect if the two categorical variables are independent.

$Expected Value = \frac{Row Total \times Column Total}{Grand Total}$

We read down the table to determine the column total and read across to determine the row total (the number of people/events in that category total). We then divide by the total number of people/events overall (‘Grand Total’).

χ²_test Formula

We now take the difference between each expected value and the actual value for that category:

$χ_{t e s t}^{2} = \sum \frac{(o b s - e x p)^{2}}{e x p}$

$χ^{2}$ is, essentially, a weighted average of the squared differences between the actual and expected frequencies. If it is much larger than zero, then the actual values are very different than the values we would expect if the two categorical variables were independent.

Degrees of Freedom and p-value Formula

Once we have determined the test statistic ( $χ_{t e s t}^{2}$ ), next, we should determine the associated p-value. Before we do that, we need to calculate the degrees of the freedom for the problem:

$Math input error$

We now plug the test statistic and degrees of freedom into the CHISQ.DIST.RT(χ²_test, df) Excel function:

$p-value = CHISQ.DIST.RT (χ_{t e s t}^{2}, d f)$

If the p-value returned is much less than the level of significance, we can easily say that the deviations between the observed and the expected counts are too large to be attributed to chance (there is a dependence between the categorical variables).

Decision

Just like all the other hypothesis tests we have performed, if the p-value returned is less than the level of significance, then we reject H₀. If not, we fail to reject H₀. Ie:

if p-value < L.O.S (Level of Significance): Reject H₀
if p-value > L.O.S (Level of Significance): Do not reject H₀

Conclusion

Again, like all of the other hypothesis tests in previous sections, if we reject H0, there is sufficient evidence to conclude (whatever it is we are trying to conclude). In this case, there will be sufficient evidence to conclude that the two categorical variables are dependent if we reject H₀. Ie:

Reject H₀: There is sufficient evidence to conclude that the two categorical variables are dependent.
Do not reject H₀: There is not sufficient evidence to conclude that the two categorical variables are dependent.

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

An Introduction to Business Statistics for Analytics (1st Edition) Copyright © 2024 by Amy Goldlist; Charles Chan; Leslie Major; Michael Johnson is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.