Chi-Squared Test of Independence
Chi-Squared Test Example
Learning Objectives
Solve a Chi-Squared Test for Independence problem both with manual and Excel calculations.
Let us now ‘dive in’ to an example where we use a Chi-Squared Test for Independence. Before we do, let us recap the steps we need to perform the Chi-Squared Test for Independence:
- State H0 and HA
- Calculate the expected frequencies (values)
- Calculate the χ2 test statistic
- Compute the p-value
- Make a decision
- Draw a conclusion
Example 67.1
Problem Setup: Analysts who work on the popular game ‘Fortnight‘ are trying to determine who they should target their in-app purchases to. In particular, they want to promote a premium add-on pack to users. They are wondering if the level of the player in the app might influence their likelihood to purchase the add-on. See the table below for the purchase results per level for randomly selected players:
Level | Purchased Add-On | Did Not Purchase | Total |
---|---|---|---|
Bronze | 60 | 40 | 100 |
Silver | 67 | 63 | 130 |
Gold or Higher | 49 | 41 | 90 |
Total | 176 | 144 | 320 |
Question: Does the player’s level affect whether they will purchase the premium add-on pack? Test at the 5% level of significance.
You Try: Try setting up and solving this problem yourself. Click the sections below to reveal the solutions when you are ready or need help.
1. The Hypotheses
The hypotheses for a contingency table question are always the same format. The Null hypothesis reflects the idea that there is no difference between the groups with respect to preference (i.e., independence). The alternative reflects the idea that there is a difference between the groups with respect to preference (i.e., dependence). There are two common methods to state the hypotheses:
Method 1
Let us examine the percent of players from each level who do purchase the add-on and do the analysis on them:
- [latex]P_B[/latex] = percent of bronze level players who purchase the add-on
- [latex]P_S[/latex] = percent of silver level players who purchase the add-on
- [latex]P_G[/latex] = percent of gold or higher level players who purchase the add-on
We assume, if whether or not they purchase the add-on is is independent of their level:
[latex]H_0: P_B= P_S= P_G \leftarrow[/latex] percent who purchase the add-on is the same amongst all levels of players
[latex]H_A:[/latex] At least one of [latex]P_B, P_S, P_G[/latex] is not equal [latex]\leftarrow[/latex] percent who purchase the add-on is not the same amongst all levels of players.
Method 2:
Alternatively you can formulate the hypotheses this way:
[latex]H_0[/latex]: Likelihood to purchase add-on is independent of player level.
[latex]H_A[/latex]: Likelihood to purchase add-on is dependent on player level.
2. The Expected Values
We will use the following formula for our expected values:
\[Exp_i = \frac{\text{Row Total}\times \text{Column Total}}{\text{Sample Size}}\]
Let us calculate the expected values within the table:
Level | Purchased Add-On | Did Not Purchase | Total |
---|---|---|---|
Bronze | [latex]\frac{100\times176}{320}=55[/latex] | [latex]\frac{100\times144}{320}=45[/latex] | 100 |
Silver | [latex]\frac{130\times176}{320}=71.5[/latex] | [latex]\frac{130\times144}{320}=58.5[/latex] | 130 |
Gold or Higher | [latex]\frac{90\times176}{320}=49.5[/latex] | [latex]\frac{90\times144}{320}=40.5[/latex] | 90 |
Total | 176 | 144 | 320 |
3. The χ2 Test Statistic
We will use the following formula to calculate the χ2 test statistic:
\[ \chi^2_{test} = \sum \frac{(obs – exp)^2}{exp} \]
Let us calculate the differences between the observed (actual) and expected (exp) values within the table:
Level | Purchased Add-On | Did Not Purchase | Total |
---|---|---|---|
Bronze | [latex]\frac{(60-55)^2}{55}=0.4545[/latex] | [latex]\frac{(40-45)^2}{45}=0.5556[/latex] | 1.0101 |
Silver | [latex]\frac{(67-71.5)^2}{71.5}=0.2832[/latex] | [latex]\frac{(63-58.5)^2}{58.5}=0.3452[/latex] | 0.6294 |
Gold or Higher | [latex]\frac{(49-49.5)^2}{49.5}=0.0051[/latex] | [latex]\frac{(41-40.5)^2}{40.5}=0.0062[/latex] | 0.0112 |
Total | 0.7428 | 0.9079 | 1.6507 |
This gives [latex]\chi^2_{test} = 0.4545 + 0.5556 + 0.2832 + 0.3452 + 0.7428 + 0.9079 = 1.6507[/latex]
4a. The Degrees of Freedom
To determine the p-value, we first need to calculate the degrees of freedom:
\[ \text{Degrees of Freedom} = df = (\text{#} rows – 1)\times (\text{#} columns – 1) = (3-1) \times (2-1) = 2 \times 1 = 2 \]
For the number of rows and columns, we only count the rows and columns that include values (not the totals or headers).
4B. The P-Value
We can now calculate the p-value using Excel’s CHISQ.DIST.RT() function:
\[ \text{p-value} = \text{CHISQ.DIST.RT}(1.6507, 2) = 0.438083\]
5. The Decision
Because the p-value = 0.4381 is greater than (>) the level of significance (5%), we cannot reject H0.
6. The Conclusion
Because we cannot reject H0, there is not sufficient evidence to conclude that whether or not a player purchases the add-on is dependent on the player’s level. What does this mean for the analysts? They should not use the player’s level when attempting trying to decide who to target when promoting the add-on pack.
Excel Solutions (VIDEO)
Let us now perform all of the calculations using Excel:
Click here to download the Excel solutions for this problem