Probability Rules
Contingency Tables
Learning Objectives
Construct and understand contingency tables.
A Contingency Table:
- Describes the relationship between categories.
- Also know as a ‘Crosstabs‘ in marketing.
- “The categories of one variable determine the rows of the table.
- The categories of the other variable determine the columns.”[1]
- “Heavily used in survey research, business intelligence, engineering, and scientific research.”[2]
Constructing a Continency Table:
- The ‘outsides’ (totals) of the table, are the ‘singular’ (or total) probabilities
- Inside the table are the intersections of categories (or ‘ANDs’)
- The table below is for two events, A and B, that can each either happen or not happen
- Because there are only 2 options for both A and B, the above table is also called a 2×2 table
- Some events can have more than 2 possible options (see examples later in this section)
A | not A | Totals | |
---|---|---|---|
B | P(A and B) | P(Ā and B) | P(B) |
not B | P(A and B̅) | P(Ā and B̅) | P(B̅) |
Totals | P(A) | P(Ā) | 1 |
Where:
- P(Ā) = P(not A) = 1 − P(A)
- P(B̅) = P(not B) = 1 − P(B)
Symbol notation in contingency tables
The contingency (or crosstabs) table can be noted using symbols also:
A | Ā | Totals | |
---|---|---|---|
B | P(A ∩ B) | P(Ā ∩ B) | P(B) |
B̅ | P(A ∩ B̅) | P(Ā ∩ B̅) | P(B̅) |
Totals | P(A) | P(Ā) | 1 |
where the symbols shown above mean the following:
- ∩ = intersection (or ‘AND’)
- P(Ā) = P(not A) = 1 − P(A)
- P(B̅) = P(not B) = 1 − P(B)
- P(A ∩ B) = P(A and B)
- P(Ā ∩ B) = P(not A and B)
- P(A ∩ B̅) = P(A and not B)
- P(Ā ∩ B̅) = P(not A and not B)
Deconstructing Contingency Tables
The Totals
In the above table, we can calculate the total (singular) probabilities:
- P(A) = P(A and B) + P(A and B̅)
- P(Ā) = P(Ā and B) + P(Ā and B̅)
- P(B) = P(A and B) + P(Ā and B)
- P(B̅) = P(A and B̅) + P(Ā and B̅)
These ‘singular’ probabilities are the overall odds of A or B happening (or not happening).
Note: The overall total (bottom right box) should always equal 1 (ie: 100%).
Inside Probabilities (Intersections)
The ‘inside’ of the table contains the ‘overlaps’ (intersections) between the categories:
- P(A and B) = the odds of both A and B occurring
- P(A and B̅) = the odds of A occurring and B not occurring
- P(Ā and B) = the odds of A not occurring and B occurring
- P(Ā and B̅) = the odds of A not occurring and B not occurring
Calculating The ‘ANDs’ (EXErCISE)
The AND‘s can be calculated using the conditional probabilities (‘givens’):
- P(A and B) = P(A|B)×P(B) = P(B|A)×P(A)
- P(A and B̅) = P(A|B̅)×P(B̅) = P(B̅|A)×P(A)
- P(Ā and B) = P(Ā|B)×P(B) = P(B|Ā)×P(Ā)
- P(Ā and B̅) = P(Ā|B̅)×P(B̅) = P(B̅|Ā)×P(Ā)
Example 17.1.1
Problem Setup: Let us examine the effectiveness of two social media marketing campaigns:
- Let’s call them campaign A and campaign B
- In each campaign, people are shown an ad
- If someone clicks on the link provided after looking at the ad, we say they ‘click through’
The percentage who click through on each ad is called the ‘click-through rate‘ (CTR):
- The CTR (click-through rate) for campaign A is 2%
- The CTR (click-through rate) for campaign B is 5%
- If someone has already viewed ad A, the CTR for campaign B rises to 15%
Question: What is the probability of someone clicking through after both ads?
You Try: Can you write out the probabilities above using ‘stats notation’?
Need Help? Click below to reveal the answers (if needed).
Calculating the AND:
We can now calculate the odds of someone clicking through after both ads:
P(A and B) = P(B|A) × P(A) = 0.15 × 0.02 = 0.003
Setting Up the Table (Exercise)
It is also possible to calculate missing values in the table. We only need to know the following:
- one or two of the totals
- some of the inside probabilities
We can calculate the rest knowing that the ‘totals’ are the sums across the rows and down the columns.
Example 17.1.2
Problem Setup: Let us continue with the two marketing campaigns example…
Question: Can you set up the contingency (cross-tabs) table for this problem?
Solution: Let us first calculate the compliments for A and B. These will be the probabilities of people NOT clicking through after seeing the ad:
- Campaign A‘s probability that someone does NOT click through = P(Ā) = 1−P(A) =1−0.02 = 0.98
- Campaign B‘s probability that someone does NOT click through = P(B̅) = 1−P(B) =1−0.05 = 0.95
You try: Can you add the above values and the values in Example 17.1.1 to the table?
Solutions to Example 17.1.2 (Click here to reveal)
A | not A | Totals | |
---|---|---|---|
B | 0.003 | 0.05 | |
not B | 0.95 | ||
Totals | 0.02 | 0.98 | 1 |
Calculating Missing Values in the Table (Exercise)
- We can use the fact that we sum across the rows and columns to determine the totals.
- We can work backwards and subtract values from the totals to get the missing values.
- Let’s try this by continuing with the ad campaign example.
Example 17.1.3
Problem Setup: Let us continue with the two marketing campaigns crosstabs (contigency) table…
Question: Can you calculate the missing values in the table we built in Example 17.1.2?
You try: Calculate the missing values where needed and complete the CLR crosstabs table.
Solutions to Example 17.1.3 (Click here to reveal)
A | not A | Totals | |
B | 0.003 | =0.05−0.003 = 0.047 | 0.05 |
not B | =0.02−0.003 = 0.017 | =0.98−(0.05−0.003) = 0.933 | 0.95 |
Totals | 0.02 | 0.98 | 1 |
Key Takeaways (EXERCISE)
Key Takeaways: Contingency Tables
Your Own Notes (EXERCISE)
- Are there any notes you want to take from this section? Is there anything you’d like to copy and paste below?
- These notes are for you only (they will not be stored anywhere)
- Make sure to download them at the end to use as a reference