Chapter 3 Measures of Central Tendency

3.4 Mean

The third, and final, measure of central tendency is one you have undoubtedly encountered before. It is one that most people have had to calculate at least a few times in their lives, and that everyone has heard reported about one thing or another. You most likely know it by its common name, the average.

 

Recall that the measures of central tendency provide information about the typical cases, or where cases tend to centre in a variable’s distribution. Thus a student’s Grade Point Average (GPA) provides a measure for how well they do academically, not in one class, but on average, across all of them; a hockey player’s points season average provides a measure of their performance on the ice not just in one game but for a whole season; a monthly average temperature gives indication of what the typical weather for a specific month is, etc. All of these averages show what is typical or expected.

 

The mean of a variable is therefore, quite simply put, the mathematical average of the values of the variable’s cases. Reported alongside the mode and the median, it provides a fuller picture of where the cases tend to cluster, or what the typical cases are. The mode does this in the simplest way, by counting their frequency and reporting the largest one. The median does that by providing the most centrally located case in terms of order.

 

Unlike the mode and the median, however, the mean takes into account the actual values of the cases.

 

Keeping the last sentence in mind, do you think the mean will apply to all and any variables? If you have been paying attention, you would know that the answer is “no, of course not”.

 

Nominal and ordinal variables have categories.  Only interval/ratio variables have actual numerical values, therefore, the mean applies only to them. After all, mathematical calculations are only possible when we have numbers with which to do the calculations: we cannot calculate an average of gender,  or of race/ethnicity, or of religious affiliation, etc.[1] We could, however, calculate an average age, income, score, temperature, etc.

 

If you had ever calculated your GPA, you already know how to calculate the mean. I will still give you an example to strengthen your knowledge.

 

Example 3.4 (A) Mean of Number of Siblings, Raw data

 

If you recall our Example 3.3 (A) from the previous Section 3.2 ( https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/ ‎), you imagined yourself asking seven of your friends about the number of siblings they had. We imagined the responses as follows: 2, 1, 4, 2, 1, 0, 3. We had to put these values in order to be able to find the median, but the mean works either way, whether the values are in order or not.

 

To calculate the average number of siblings your imagined friends have, we simply add all responses together and divide them by the total number of friends, i.e., by 7:

 

    \[\frac{(2+1+4+2+1+0+3)}{7}=\frac{13}{7}=1.86\]

 

That is, your imagined friends have 1.86 siblings on average (or not quite but closer to two, rather than one siblings on average). We could also say that the mean of number of siblings is 1.86.

 

Let’s do it again, as practice makes perfect.

 

Example 3.5 Textbook Prices For a Semester, Raw Data

 

Depending on the courses you take in a semester, what you pay for books will vary but let’s say we’re interested in how much you pay for books in a typical semester. Perhaps you are very-well organized and want to finish your degree as quickly as possible so you have decided to take five courses per semester. For simplicity’s sake, let’s assume your were assigned one book per course. These are the books’ prices: $120, $230, $300, $65, $30. How much did you pay for a book on average?

 

    \[\frac{(120+230+300+65+30)}{5}=\frac{745}{5}=149\]

 

That is, despite the fact that some of your books were expensive (like the $300 one), and some relatively cheap (like the $30 one), the average price you paid for a book in that semester was $149.

 

Now that we’ve seen how the mean works in practice, let’s generalize what we did in the two examples above using proper notation. Fair warning: the formula below does look complicated but remember what we just did: our calculations were quite simple (adding all values, dividing their sum by their total number), and so is the formula. As usual, it simply restates what we’ve said in words in a mathematical shorthand. If you know what each symbol in the shorthand stands for, you know what the formula means. So, take a deep breath:

 

(1)   \begin{equation*} \frac{x_1+x_2+x_3+\dots+x_N}{N}=\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\overline{x} \end{equation*}

 

where ∑ stands for “sum”[2], \sum\limits_{i=1}^{N} indicates to sum all cases from the first (1) to the last (N), xi stands for any case with a number between 1 and N, and \overline{x} indicates the mean[3], i.e., the average of all the xi‘s. Thus, the formula basically tells you to add all values and divide by their total, just as we did in the examples.

 

So far, we only calculated the means for raw data, i.e., data not presented in a frequency table. Will the calculation of the mean be different if we had a frequency table instead? While the principle is the same, the fact that the values are grouped by frequency in frequency tables requires that we do a slight modification to our calculations. Here’s a small-scale illustration to demonstrate the principle before we do an example with a larger N.

 

Example 3.4 (B) Mean for Number of Siblings, Aggregated Data

 

Arranging the raw data from Example 3.4 (A) above, we again get the following table.

Table 3.3 Frequency Table for Number of Siblings

Value Frequency
0 1
1 2
2 2
3 1
4 1
Total 7

According to the formula for the mean, we need to add all values together and then divide their sum by their total number. When the values are disaggregated (i.e., raw), we can proceed to adding them up right away. However, when they are grouped by frequency, we first need to multiply each value by its respective frequency, and then add the value-times-frequency products together, before dividing them by the total number, like this:

 

    \[\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{(0+1+1+2+2+3+4)}{7}=\frac{0(1)+1(2)+2(2)+3(1)+4(1)}{7}=\frac{13}{7}=1.86=\overline{x}\]

 

Again, the average number of siblings of these seven friends is 1.86, as previously calculated.

 

Now let’s apply the same principle to a new, larger-N example.

 

Example 3.6 Age of Classmates, Aggregated Data

 

Imagine you are doing a survey for one of your class assignments and one of the questions is about age. You aggregate the data by frequency and you get the following table.

 

Table 3.5 Mean for Age of Classmates

Value Frequency
19 1
20 10
21 12
22 8
25 2
27 1
35 1
TOTAL 35

By the formula, we have:

\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{19(1)+20(10)+21(12)+22(8)+25(2)+27(1)+35(1)}{35}=\frac{19+200+252+176+50+27+35}{35}=\frac{759}{35}=21.69=\overline{x}

 

Or, now you know that the average age of your classmates in that class is 21.69 years, or a bit less than 22 years.


  1. Note that in specific cases it's possible to calculate something like an average for certain ordinal variables, for example, Likert-scales, to the extent that their numerical labels reflect a somewhat monotonic, stable-unit, distances. This should be done with extreme care and ample justification, however, and beginner researchers (like you) are advised against using means for ordinal variables.
  2. is pronounced "SIG-ma" and is the Greek letter S.
  3. \overline{x} is pronounced "EX-bar".

License

Simple Stats Tools Copyright © by Mariana Gatzeva. All Rights Reserved.

Share This Book