Chapter 4 Measures of Dispersion
Early on in Chapter 3 we established that there are three pieces of information which helps us describe variables. Describing variables helps us to glean something from the variables’ distribution beyond the raw list of observations of which it is made. In other words, through descriptive analysis we get to learn something about the cases that is not readily observable when all we have is a collection of data points.
Graphs provide a first glimpse at a variable’s distribution. Measures of central tendency provide information about the typical cases, where most cases tend to cluster, or about the “centre” of the data. We now turn to measures of dispersion, the last of the three key pieces of descriptive information pertaining to variables. Measures of dispersion tell us how”spread out” a variable’s cases are; they provide a “clusteredness” measure of the data, as it were, and of how dispersed cases are across the variable’s values.
A simple illustration will make dispersion measures easier to understand. Take two sets of three numbers: “4, 5, 6” and “2, 5, 8”. By now, you should be able to tell immediately that the median of both sets is 5 (each set has one value below and one above 5). You also might be able to easily see that the mean of both sets is also 5; if not, this is how we get it:
Even if both “4, 5, 6” and “2, 5, 8” sets have the same measures of central tendency, you’d be hard-pressed to claim they are the same sets of numbers. Take a look at the image below (or just look at a ruler of your own, if you have one close by): the values of 4 and 6 are much closer to 5, than 2 and 8 are. That is, the values of our first set are more closely clustered around the “centre”, while the values of our second set are more loosely spread around it. This “clustering” vs. “spreading” is precisely what dispersion measures.
There are four commonly used measures of dispersion.[1] Before we turn to each of them in turn, note what I have just demonstrated here: it is quite possible for two variables to have the same measures of central tendency but different measures of dispersion.
The four measures of dispersion can be divided into two groups. We begin with the simpler two, the range and the interquartile range, then turn to the more complicated (but most widely used) pair, the variance and the standard deviation.
- A fifth measure of dispersion exists but is less commonly used. I'll introduce it only insofar as it is useful for understanding the standard deviation, the most widely used measure of dispersion. ↵