Frequency Distributions and Visualizing Data
Frequency Distributions
Learning Objectives
Construct and understand frequency distributions
A frequency distribution is:
- An organized tabulation/graphical representation of the number of data in each category.
- It allows us to have a glance at the entire data conveniently[1]
- See the sample frequency table below (created from student survey data from 2019):
Class | Frequency | % Frequency |
---|---|---|
143 – 149 | 1 | 0.57% |
150 – 156 | 13 | 7.43% |
157 – 163 | 32 | 18.29% |
164 – 170 | 47 | 26.86% |
171 – 177 | 26 | 14.86% |
178 – 184 | 44 | 25.14% |
185 – 191 | 9 | 5.14% |
192 – 198 | 3 | 1.71% |
Totals | 175 | 100% |
Steps to Construct A Frequency Distribution
To construct a frequency table, follow the following steps:
- Calculate the range (MIN and MAX values)
- Choose the number of classes/intervals to divide the range into (more on this below)
- Calculate the width of each class (round to the closest ‘convenient’ number): \[ \text{Width (}W\text{)} = \frac{MAX – MIN}{\text{#Classes (}C\text{)}} \]
- Pick an ARBITRARY starting point (less than or equal to the MIN) = first lower limit (LL). Ie: \[ \text{First Lower Limit (}LL_1\text{)} = \text{Starting Point}\]
- Calculate lower limits. \[ \text{Lower Limit (}LL\text{)} = \text{Previous Lower Limit} + \text{Width (}W\text{)} \]
- Determine the precision level in data. Ex: precision = 2 decimal places, or 0.01, for money.
- Subtract precision level from lower limits to calculate upper limits. \[ \text{Upper Limit (}LL\text{)} = \text{Next Lower Limit } – \text{ Precision Level (ex: 0.01)} \]
- Count the number of data in each class. In Excel, use a Pivot Table or use =COUNTIF().
- Calculate the percent of data in each class by dividing by the total number of data: \[\text{% frequency}= \frac{\text{frequency (}f\text{)}}{\text{sample size (} n \text{)}} \]
Determining number of classes
If there are too many class intervals:
- There is no reduction in the “bulkiness” of data
- And minor deviations also become noticeable[2]
If there are too few class intervals:
- The shape of the distribution itself cannot be determined
- Generally, 6–14 intervals are adequate[3].
See the table (using the 2k rule) below for suggested numbers of classes:
Sample Size | Number of Classes |
---|---|
9–16 | 4 |
17–32 | 5 |
33–64 | 6 |
65–128 | 7 |
129–256 | 8 |
257–512 | 9 |
513–1,024 | 10 |
1,025–2,048 | 11 |
2,049–4,096 | 12 |
4,097–8,192 | 13 |
8,193–16,384 | 14 |
Creating Frequency Tables ‘Manually’
Example 9.1.1
Problem Setup: Suppose we have taken a SAMPLE of 30 BCIT students (click here to download as Excel sheet), and asked them of their heights, in centimeters (below):[4] (click here for the Excel file with these values):
173 | 153 | 172 | 191 | 173 | 167 | 156 | 169 | 175 | 169 |
159 | 163 | 177 | 155 | 152 | 178 | 172 | 188 | 152 | 171 |
174 | 183 | 192 | 151 | 159 | 184 | 170 | 186 | 155 | 156 |
Question: What is the frequency table for these heights?
Steps: We will follow the steps outlined in the first section:
- Range: [latex]MAX = 192[/latex], [latex]MIN = 151[/latex]
- Number of classes: [latex]C = 5[/latex]
- Width: [latex]W = \frac{MAX − MIN}{C} = \frac{192 − 151}{5} =8.2[/latex] (round up to 10 since ‘easier’ to work with)
- Starting point: [latex]LL_1 = 150[/latex] (150 is less than min value of 151 and ‘easier’ to work with)
- Lower Limits:
[latex]\begin{align*} LL_2 &= LL_1 + W = 150 + 10 = 160\\ LL_3 &= LL_2 + W = 160 + 10 = 170\\ ... \end{align*}[/latex] - Precision level: Data recorded to nearest 1’s
- Upper Limits:
[latex]\begin{align*} UL_1 &= LL_2 - Precision = 160 - 1 = 159\\ UL_2 &= LL_3 - Precision = 170 - 1 = 169\\ ... \end{align*}[/latex] - Count of data in each class and input into the “Frequency” column (see the table below).
- Divide each frequency by the total number of data to get the relative (%) frequencies.
Resulting Frequency Distribution: Below is the resulting frequency table/distribution:
Class | Frequency | % Frequency |
---|---|---|
150 – 159 | 10 | 33.33% |
160 – 169 | 4 | 13.33% |
170 – 179 | 10 | 33.33% |
180 – 189 | 4 | 13.33% |
190 – 199 | 2 | 3.67% |
Totals | 30 | 100% |
Creating Frequency Tables in Excel (VIDEO)
Example 9.1.2
Problem Setup: Let us now revisit the 30 students’ heights example from the previous section, but this time, using Excel. Click here to download the file shown in the video below.
Solution: See the above video for the solution to this problem. Click below to reveal the steps followed. Also, click here to download the solution file.
Creating Frequency Tables using Pivot Tables (VIDEO)
Example 9.1.3
Problem Setup: Let us again revisit the 30 students’ heights example from the previous section, but this time, using a Pivot Table. Click here to download the file shown in the video below.
Solution: See the above video for the solution to this problem. Click below to reveal the steps followed. Also, click here to download the solution file.
Class Frequencies, Boundaries, Limits, Marks and Width
Frequencies
- It is important to be able to interpret what this frequency distribution tells us.
- Keep in mind that the FREQUENCY refers to the number of data in each class.
- In a sample of 30 BCIT students, 10 of them have heights between 150 and 159 cms., 4 between 160 and 169 cms., 10 between 170 and 179 cms., etc.
Relative Frequencies
- The frequency as a percent of the total
- We obtain the relative frequency by dividing the frequency of the classes by the total frequency, i.e., f/r.t
Class Boundaries
- The CLASS BOUNDARIES are the mid-points BETWEEN the classes.
- In example 9.1, the U.L. of the first class is 159, and the L.L. of the second class is 160.
- The boundary between the first and second class is then 159.5.
- Similarly, 169.5 is the next boundary.
Class Marks
- The CLASS MARKS are the mid-point WITHIN the classes.
- The mid-point of the first class in example 9.1 is [latex]\frac{150 + 159} {2} = 154.5[/latex].
- The mid-point of the second class is [latex]\frac{160 + 169} {2} = 164.5[/latex]. The rest then are 174.5, 184.5, and 194.5.
- There are as many class marks as there are classes.
Class Width
The class width is NOT the difference between upper and lower limits of the same class. Rather, is can be calculated by taking the difference between the following successive values:
- lower limits (ex: 160 − 150)
- upper limits (ex: 169 − 159)
- boundaries (ex: 159.5 − 149.5)
- marks (ex: 164.5 − 154.5)
Key Takeaways (EXERCISE)
Key Takeaways: Frequency Distributions
Your Own Notes (EXERCISE)
- Are there any notes you want to take from this section? Is there anything you’d like to copy and paste below?
- These notes are for you only (they will not be stored anywhere)
- Make sure to download them at the end to use as a reference
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117575/#CIT1 ↵
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117575/#CIT2 ↵
- Dawson B, Trapp RG. 4th ed. New York: McGraw Hill; 2004. Basic and clinical biostatistics. [Google Scholar] ↵
- We used to survey the students and ask 25 questions. See the last survey results from 2019 collected here: https://docs.google.com/spreadsheets/d/14hmeeeEvaI-3uwUDAwSot0l6UqYlurgwMKCMHb1CSy8/edit?usp=sharing ↵