Chapter 4 Measures of Dispersion

4.4 Variance Continued, Standard Deviation

I’m sure you’ll agree the preceding section was a lot to take in. And here’s the kicker: after all that, we arrived at something which we cannot easily or intuitively interpret, given the squared units. However, the variance is used a lot in statistics, for great many things. Generally, the larger the variance, the greater the variability of the variable, or the larger the “dispersed-ness” of the cases.

 

Despite the seemingly convoluted way we arrived at the variance and all the calculations and mathematical notation, what we did was actually quite simple. (No, really!)

 

To recap: just like we average all values by summing them up and dividing the sum on their total to get the mean, we average the distances of the values from the mean by summing them up and dividing the sum on their total. The only difference is that in order to be able to sum the distances, we need to square each of them first, or we cannot proceed.

 

Here are the formulas for the mean and the variance together so that you can compare:

 

\frac{\sum\limits_{i=1}^{N}{x_i}}{N} = \overline{x}   ← mean

 

\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N} = \sigma^2    ← variance

 

Now that I have you feeling somewhat comfortable, I have a confession to make. This above isn’t the only version of the formula for variance that exists or that we will be using.

 

Bear with me (and welcome back, to those who threw the reading away in disgust) — I promise to explain everything when we get to inferential statistics further in the textbook, as the explanation requires concepts and terminology we have not yet covered and which cannot be easily introduced at this point. (Hint: it deals with estimation and uncertainty.)[1] 

 

One thing worth noting, however, is that despite the lack of proper explanation as of yet, when working with typical datasets SPSS will produce variances by dividing the sum of squares by N-1 instead of by N.

 

Watch Out!! #9 … for The Order of Operations

 

When considering the formula for variance, and the steps we took to calculate it, pay special attention to the sum of squares. That is, we need a sum of squares (a.k.a., to add the squared distances from the mean together): we first calculate the distances, then square them, and finally sum the squared distances up.

 

A common mistake, however, is to try to calculate the distances, sum them up, then square the sum. As explained above, the (un-squared) distances add up to zero, and squaring the zero will not improve things. A version of this mistake is also to calculate the distances, then try to sum them and divide them by N-1, and then square the result. Obviously this would also be unsuccessful. To avoid these type of frustrations, try to remember the purpose of the squaring: to “turn” all distances into positive numbers. Everything else we do (summing, dividing), we do to the already squared distances.

 

In an effort to show you that the calculation of the variance is simple when done without the protracted explanations, take another example we have used before, number of siblings.

Example 4.5 Variance for Number of Siblings

 

In discussing the median in Section 3.2 (https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/), we imagined you asked seven of your friends about the number of their siblings. These were the values we used:  2, 1, 4, 2, 1, 0, 3.

 

Let’s produce the variance, in four simple steps, after calculating the mean; Step 1A, obtain the distances from the mean; Step 1B, square the distances from the mean; Step 2, obtain the sum of squares (i.e., sum the distances up); Step 3, divide by N.

 

Preliminary step: obtain the mean.

\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{2+1+4+2+1+0+3}{7}=\frac{13}{7}=1.857= \overline{x}

 

Steps 1A and 1B are presented in the table below:

 

Table 4.4 Calculating Distances To the Mean and Squaring Each Distance

x_i (x_i - \overline{x}) (x_i - \overline{x})^2
2 (2 – 1.857) = 0.143 (0.143)2 = 0.02
1 (1 – 1.857) = -0.857 (-0.857)2 = 0.734
4 (4 – 1.857) = 2.143 (2.143)2 = 4.592
2 (2 – 1.857) = 0.143 (0.143)2 = 0.02
1 (1 – 1.857) = -0.857 (-0.857)2 = 0.734
0 (0 – 1.857) = -1.857 (-1.857)2 = 3.448
3 (3 – 1.86) = 1.143 (1.143)2 = 1.306

Step 2, obtain the sum of squares:

 

\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2} = (0.02)2+(0.734)2+4.592+3.448+1.306=10.854    ←Sum of Squares

 

Step 3, divide the sum of squares (rounded down to two digits) by N, i.e., by 7:

 

\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N}=\frac{10.85}{7}=1.55= \sigma^2    ← variance

 

Thus, we find that your seven friends have an average of about 1.6 squared distances from the mean number of siblings 1.9 (rounded up from 1.857).

 

Oh, great, you are probably thinking now, and I can imagine the sarcasm — we calculated something we can’t even interpret properly. I mean, it’s more than a tad awkward to try to explain “an average of about 1.6 squared distances from the mean number of siblings” to anyone not versed in statistics. Maybe it would be better if we could get rid of the “squared-ness”?

 

You know what? We can. The standard deviation is here to help.

 

Standard deviation. Believe it or not, after all the steps we went through to get to the variance, calculating the standard deviation is a breeze: specifically, a breeze that turns back the squared units into standard units, hence the name.

 

See for yourself:

 

\sqrt{\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N}} = \sqrt{\sigma^2}=\sigma    ← standard deviation

 

Despite its scary looks, this is actually just the formula for variance under a square root. That is, we take the square root of the variance to get the standard deviation. That’s it. Nothing more. Just a regular square root, and we’re there. Cue in a sigh of relief![2]

 

Now that we know how to get back to standard units, let’s do that for the two examples we used. We had a variance of σ2 = 15.21 for hours worked per week in the previous section and a variance of σ2 = 1.6 for numbers of siblings in the example above. Square-rooting gives us the following:

 

    \[\sqrt{\sigma^2}=\sqrt{15.21}=3.9\]

 

and

 

    \[\sqrt{\sigma^2}=\sqrt{1.6}=1.25\]

 

Now these we can interpret: on average, your hours worked per week deviated from the mean of 8.7 hours per week by 3.9 hours, and your friends deviated from the average number of siblings,1.9, by 1.25 siblings.

 

To repeat, the standard deviation is the square root of the variance. The standard deviation is a measure of dispersion which gives us the average deviation of the cases from the mean. (Technically, an average of the squared distances from the mean in standard units.)

 

Do It! 4.2 Longevity of The First Fifteen Canadian Prime Ministers

 

Calculate the variance and standard deviation of the longevity of the first fifteen Prime Ministers of Canada. In chronological order (starting with Macdonald and ending with Pierre Trudeau), their ages at the time of death were: 76, 70, 72, 49, 93, 94, 77, 82, 86, 75, 76, 91, 83, 75, and 80. Interpret your results (i.e., explain what you have found beyond “the standard deviation is …”).

 

You can use a table like Table 4.4 to organize your calculations. (Hint: Start with calculating the mean age at death, \overline{x}, and round it up to a whole number to make your job easier.) Here x_i is age at death for each PM and N=15.

 

You can check your answers in this footnote.[3]

 

Of course, one wouldn’t normally calculate variances and standard deviations by hand: we only do it so that you can understand what the measures are and what they really provide us with, by obtaining them ourselves. Usually, however, we simply use SPSS.

 

SPSS Tip 4.2 Obtaining Variance and Standard Deviation

  • From the Main Menu, select Analyze, then Descriptive Statistics, and then Frequencies;
  • Select your variable of choice from the list on the left and use the arrow to move it to the right side of the window;
  • Click on the Statistics button on the right;
  • In this new window, check Variance and Standard deviation in the Dispersion section on the left at the bottom;
  • Click Continue, then OK.
  • The Output window will provide a table with the requested measures.
  • Make sure you know how to interpret your results! (Try to use as little statistics jargon as possible.)

  1. If you'd like a preview, the alternative, to-be-explained-later, formula for variance is:   \frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N-1} = s^2 =    ← variance   As you can see, the modification is quite small -- instead of dividing the sum of squares by the total number N, we actually divide it by the total minus one, N-1. If it makes you feel better, dividing just by N or by N-1 produces generally similar results, in terms of magnitude of the variance. We also denote this version with a regular small-case s^2.
  2. Note, however, that just like there is an "alternative", to-be-explained-later, formula for variance, there is an "alternative" formula for standard deviation, following the same principle regarding dividing the sum of squares by N-1 instead of by N:   \sqrt{\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N-1}} = \sqrt{s^2}=s   ← standard deviation    As well, SPSS will use this (N-1) version of the formula when working with variables in a dataset.
  3.  The mean is 79 years; the sum of squares 1,717; the variance 114.5; the standard deviation 10.7 years. However, if you calculated the variance and standard deviation with N-1 in the denominators, you will get a variance of 123 and a standard deviation of 11.1 years.  The difference is as large as it is due to the small N. Had we been working with a real dataset of hundreds or thousands of cases, the difference between the just-N and N-1 versions of the formulas would have been less pronounced.

License

Simple Stats Tools Copyright © by Mariana Gatzeva. All Rights Reserved.

Share This Book