Chapter 3 Measures of Central Tendency
Now that you have learned the preliminaries — what datasets and variables are, and how to summarize the information within a variable in tabular and graphical formats — it’s time to turn to applied statistics proper. Statistics allows us to analyze information , i.e., to learn more than what we simply see at first glance. Thus we scrutinize the data collected in great detail to get the most out of it, in terms of both description (examining what we see) and inference (reaching evidence-based conclusions).
Aptly, we talk about descriptive statistics and inferential statistics. In the latter half of this book we will turn to inferential statistics which is devoted to inferential analysis on the basis of probability theory. We now start with descriptive statistics devoted to the descriptive analysis of variables, i.e., to learning all we possibly can about a variable and its distribution. If you recall from Chapter 2’s introduction, a variable’s distribution is the way the observations/cases are distributed across the variable’s categories. The cases can be concentrated closer together or more spread out, and exploring such features of a variable’s distribution is the focus of this chapter and the next.
In addition to the visual summary of a variable which we get through graphs and which allow us to virtually see a variable’s distribution, generally there are two further types of information we can get through descriptive analysis. They are called central tendency and dispersion.
Considering what a variable in a dataset looks like, recall that a variable has a list of observations/ cases (think, for example, of the responses collected through a survey question) where the list is size N (N, again, is the number of elements, in general, or respondents if we focus specifically on people, as we usually do). Thus, on the one hand, we talk about typical cases, or where cases tend to cluster — for example, what the most frequent response given is, if respondents tend to give similar answers, etc. — and what the “centre” of the variable’s distribution is. Measures related to this type of information are called measures of central tendency. There are three of them and we explore all of them in the current chapter in turn, the mode, the median, and the mean.
On the other hand, we can also talk about how much a variable’s distribution is “spread out”. That is, if a variable is called that because the responses vary across people, how variable a variable actually is – does it vary a lot or does it vary a little? Are all responses clustered around the “centre” or are they relatively dispersed? Measures related to this type of information are called measures of dispersion, and they are presented in the next chapter.
To summarize, we describe variables by providing and exploring 1) the visual summary of their distribution (i.e., a graph), 2) their measures of central tendency, and 3) their measures of dispersion.
There is a catch, however: Not all measures of central tendency and dispersion are appropriate for all variables. Just like not all graphs are appropriate for each type of variable, whether a measure of central tendency or dispersion is applicable to a variable or not depends on the variable’s level of measurement.
I did already warn you that determining the proper level of measurement of a variable is key — without that, you can execute correctly neither descriptive, nor inferential analysis. Go back and reread Section 1.3 if necessary (https://pressbooks.bccampus.ca/simplestats/chapter/1-3-levels-of-measurement/) or what comes next will make little sense to you.
But enough with the boring theory — on to the the application of central tendency measures!