Chi-Squared Test of Independence

Chi-Squared Test Example

Learning Objectives

Solve a Chi-Squared Test for Independence problem both with manual and Excel calculations.

Let us now ‘dive in’ to an example where we use a Chi-Squared Test for Independence. Before we do, let us recap the steps we need to perform the Chi-Squared Test for Independence:

  1. State H0 and HA
  2. Calculate the expected frequencies (values)
  3. Calculate the χ2 test statistic
  4. Compute the p-value
  5. Make a decision
  6. Draw a conclusion

Example 67.1

Problem Setup: Analysts who work on the popular game ‘Fortnight‘ are trying to determine who they should target their in-app purchases to. In particular, they want to promote a premium add-on pack to users. They are wondering if the level of the player in the app might influence their likelihood to purchase the add-on. See the table below for the purchase results per level for randomly selected players:

Level Purchased Add-On Did Not Purchase Total
Bronze 60 40 100
Silver 67 63 130
Gold or Higher 49 41 90
Total 176 144 320

Question: Does the player’s level affect whether they will purchase the premium add-on pack? Test at the 5% level of significance.

You Try: Try setting up and solving this problem yourself. Click the sections below to reveal the solutions when you are ready or need help.

1. The Hypotheses

The hypotheses for a contingency table question are always the same format. The Null hypothesis reflects the idea that there is no difference between the groups with respect to preference (i.e., independence). The alternative reflects the idea that there is a difference between the groups with respect to preference (i.e., dependence). There are two common methods to state the hypotheses:

Method 1

Let us examine the percent of players from each level who do purchase the add-on and do the analysis on them:

  • [latex]P_B[/latex] = percent of bronze level players who purchase the add-on
  • [latex]P_S[/latex] = percent of silver level players who purchase the add-on
  • [latex]P_G[/latex] = percent of gold or higher level players who purchase the add-on

We assume, if whether or not they purchase the add-on is is independent of their level:

[latex]H_0:  P_B= P_S= P_G \leftarrow[/latex] percent who purchase the add-on is the same amongst all levels of players

[latex]H_A:[/latex] At least one of [latex]P_B, P_S, P_G[/latex] is not equal [latex]\leftarrow[/latex] percent who purchase the add-on is not the same amongst all levels of players.

Method 2:

Alternatively you can formulate the hypotheses this way:

[latex]H_0[/latex]: Likelihood to purchase add-on is independent of player level.

[latex]H_A[/latex]: Likelihood to purchase add-on is dependent on player level.

2. The Expected Values

We will use the following formula for our expected values:

\[Exp_i = \frac{\text{Row Total}\times \text{Column Total}}{\text{Sample Size}}\]

Let us calculate the expected values within the table:

Level Purchased Add-On Did Not Purchase Total
Bronze [latex]\frac{100\times176}{320}=55[/latex] [latex]\frac{100\times144}{320}=45[/latex] 100
Silver [latex]\frac{130\times176}{320}=71.5[/latex] [latex]\frac{130\times144}{320}=58.5[/latex] 130
Gold or Higher [latex]\frac{90\times176}{320}=49.5[/latex] [latex]\frac{90\times144}{320}=40.5[/latex] 90
Total 176 144 320

3. The χ2 Test Statistic

We will use the following formula to calculate the χ2 test statistic:

\[ \chi^2_{test} = \sum \frac{(obs – exp)^2}{exp} \]

Let us calculate the differences between the observed (actual) and expected (exp) values within the table:

Level Purchased Add-On Did Not Purchase Total
Bronze [latex]\frac{(60-55)^2}{55}=0.4545[/latex] [latex]\frac{(40-45)^2}{45}=0.5556[/latex] 1.0101
Silver [latex]\frac{(67-71.5)^2}{71.5}=0.2832[/latex] [latex]\frac{(63-58.5)^2}{58.5}=0.3452[/latex] 0.6294
Gold or Higher [latex]\frac{(49-49.5)^2}{49.5}=0.0051[/latex] [latex]\frac{(41-40.5)^2}{40.5}=0.0062[/latex] 0.0112
Total 0.7428 0.9079 1.6507

This gives [latex]\chi^2_{test} = 0.4545 + 0.5556 + 0.2832 + 0.3452 + 0.7428 + 0.9079 = 1.6507[/latex]

4a. The Degrees of Freedom

To determine the p-value, we first need to calculate the degrees of freedom:

\[ \text{Degrees of Freedom} = df = (\text{#} rows – 1)\times (\text{#} columns – 1) = (3-1) \times (2-1) = 2 \times 1 = 2 \]

For the number of rows and columns, we only count the rows and columns that include values (not the totals or headers).

4B. The P-Value

We can now calculate the p-value using Excel’s CHISQ.DIST.RT() function:

\[ \text{p-value} = \text{CHISQ.DIST.RT}(1.6507, 2) = 0.438083\]

5. The Decision

Because the p-value = 0.4381 is greater than (>) the level of significance (5%), we cannot reject H0.

6. The Conclusion

Because we cannot reject H0, there is not sufficient evidence to conclude that whether or not a player purchases the add-on is dependent on the player’s level. What does this mean for the analysts? They should not use the player’s level when attempting trying to decide who to target when promoting the add-on pack.

Excel Solutions (VIDEO)

Let us now perform all of the calculations using Excel:

Click here to download the Excel solutions for this problem

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

An Introduction to Business Statistics for Analytics (1st Edition) Copyright © 2024 by Amy Goldlist; Charles Chan; Leslie Major; Michael Johnson is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book