Chapter 1 Variables and Their Measurement

1.5 Discrete and Continuous Variables

 

I will introduce a final useful typology by which variables can be grouped: discrete and continuous.

 

By definition, variables called discrete (note, not discreet!) have finite number of categories (i.e.,”space” between them, and nothing occupies that space), while variables called continuous have potentially infinite number of values (i.e., it’s possible that a value exists between any two given values, in smaller and smaller — infinite — number of “spaces” between any two the values, to infinity). To make things easier to understand, and with more than a little risk of oversimplification, in a very broad sense you can think of nominal and ordinal variables as discrete and of interval/ratio variables as continuous.[1] For example, hair colour, religious affiliation, and educational attainment (as measured in educational degrees) are all discrete: they have finite number of discrete categories.

 

On the other hand, age, income, or exam scores are all continuous: a number (value) can exist between any two given values, depending on how precise you want your measurement to be. To take age, for example, if two people report being 20 and 22, respectively, it’s obviously possible that another person in 21. However, we need not round to full years; between two people ages 20 and 21, a value of 21.5 (or 21 years and 6 months) is possible to exist. Further, between the ages of 21 years and 21 years and 6 months, we can have a value of 21 years and 3 months, and so on, until we are down to counting days, then counting hours, then counting minutes, then counting seconds, then milliseconds, then microseconds, then nanoseconds, etc…. The point is that, in theory, there is always a smaller number between any two numbers (which can be represented by the possibility of infinite number of digits after the decimal point). The same can be applied to income and exam scores too.

 

In practice, however, things are different. In sociological research (as with other similar disciplines), the data collected is empirically discrete, as the values collected are a finite number and are typically rounded to whole numbers: we don’t bother to measure age in anything but years, income in dollars (and not cents), etc. Still, we usually call interval/ratio variables are continuous, because of the potential for infinite number of values.

 

At the same time, however, some ratio variables are truly discrete. Think, for example, about a measure called number of children of the respondent. Clearly, there is no possibility for an infinite number of values, just like with any “number of people”-type variable: people can only be counted in whole numbers, and the count is always finite.

 

All this is undoubtedly confusing, so here is a practical tip for applied research, and what you need to focus on. Regardless if a variable is discrete or continuous in theory, in practice all variables you will encounter in real-life, actual datasets will be discrete. What we do is treat some variables as discrete, and other variables as continuous for the purposes of statistical analysis. The rule of thumb is to make the differentiation based on the number of categories/values: typically nominal and ordinal variables have relatively few categories so we treat them as discrete, while interval/ratio variables typically have relatively large number of values, so we treat them as continuous. If, however, an ordinal variable has relatively large number of categories it may be treated as continuous, and, on the flip side, if an interval/ratio variable has relatively few values it may be treated as discrete. Generally, and assuming proper justification (i.e., a large number of categories/values), the decision to treat an ordinal variable as continuous or an interval/ratio variable as discrete remains a matter of the researcher’s discretion.

 

Finally, what is the magic number in the  “relatively large number of categories/values” rule? This also depends, but from what I have seen in practice, the number is around 7-10 categories/values for most (i.e., if a variable has more categories/values that that it’s treated as continuous, and if it has fewer categories/values than that it is treated as discrete).

 


  1. Technically speaking, in theory nominal and some ordinal variables are categorical, ordinal variables with numerical categories are discrete, and interval/ratio variables are continuous. In practice, things are less clear cut.

License

Simple Stats Tools Copyright © by Mariana Gatzeva. All Rights Reserved.

Share This Book