68 Descriptive Statistics
Quantitative data are analyzed in two main ways: (1) Descriptive statistics, which describe the data (the characteristics of the sample); and (2) Inferential statistics. More formally, descriptive analysis “refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs” (Bhattacherjee, 2012, p. 119). All quantitative data analysis must provide some descriptive statistics. Inferential analysis, on the other hand, allows you to draw inferences from the data, i.e., make predictions or deductions about the population from which the sample is drawn.
Developing Descriptive Statistics
As mentioned above, descriptive statistics are used to summarize data (mean, mode, median, variance, percentages, ratios, standard deviation, range, skewness and kurtosis). When one is describing or summarizing the distribution of a single variable, he/she/they are doing univariate descriptive statistics (e.g. mean age). However, if you are interested in describing the relationship between two variables, this is called bivariate descriptive statistics (e.g. mean female age) and if you are interested in more than two variables, you are presenting multivariate descriptive statistics (e.g. mean rural female age). You should always present descriptive statistics in your quantitative papers because they provide your readers with baseline information about variables in a dataset, which can indicate potential relationships between variables. In other words, they provide information on what kind of bivariate, multivariate and inferential analyses might be possible. Box 10.4.1.1 provide some resources for generating and interpreting descriptive statistics. Next, we will discuss how to present and describe descriptive statistics in your papers.
Box 10.2 – Resources for Generating and Interpreting Descriptive Resources
See UBC Research Commons for tutorials on how to generate and interpret descriptive statistics in SPSS: https://researchcommons.library.ubc.ca/introduction-to-spss-for-statistical-analysis/
See also this video for a STATA tutorial on how to generate descriptive statistics: Descriptive statistics in Stata® – YouTube
Presenting descriptive statistics
There are several ways of presenting descriptive statistics in your paper. These include graphs, central tendency, dispersion and measures of association tables.
- Graphs: Quantitative data can be graphically represented in histograms, pie charts, scatter plots, line graphs, sociograms and geographic information systems. You are likely familiar with the first four from your social statistics course, so let us discuss the latter two. Sociograms are tools for “charting the relationships within a group. It’s a visual representation of the social links and preferences that each person has” (Six Seconds, 2020). They are a quick way for researchers to represent and understand networks of relationships among variables. Geographic information systems (GIS) help researchers to develop maps to represent the data according to locations. GIS can be used when spatial data is part of your dataset and might be useful in research concerning environmental degradation, social demography and migration patterns (see Higgins, 2017 for more details about GIS in social research).
There are specific ways of presenting graphs in your paper depending on the referencing style used. Since many social sciences disciplines use APA, in this chapter, we demonstrate the presentation of data according to the APA referencing style. Box 10.4.2.3 below outlines some guidance for presenting graphs and other figures in your paper according to the APA format while Box 10.4.2.4 provides tips for presenting descriptives for continuous variables.
Box 10.3 – Graphs and Figures in APA
Graphs and figures presented in APA must follow the guidelines linked below.
Source: APA. (2022). Figure Setup. American Psychological Association. https://apastyle.apa.org/style-grammar-guidelines/tables-figures/figures
Box 10.4 – Tips for Presenting Descriptives for Continuous Variables
- Remember, we do not calculate the means for Nominal and Ordinal Variables. We only describe the percentages for each attribute.
- For continuous variables (Ratio/Interval), we do not describe the percentages, we describe, means, range (min, max), standard errors, standard deviation.
- Present all the continuous variables in one table
- Variables (not attributes) go in the rows
- Use separate columns for the descriptive (Mean, S.E. Std. Deviation, Min, Max, N).
To provide a practical illustration of the tips presented in Box 10.4.2.4, we provide some hypothetical data of what a descriptive table might look like in your paper (following APA guidance) in the following box.
Table 10.2 - Descriptive Statistics for Key Variables in a Hypothetical Study | ||||||
---|---|---|---|---|---|---|
Dependent Variables | N | Min. | Max. | Mean | SE | SD |
Age | 250 | 15 | 40 | 26.7 | 1.25 | 2.17 |
Perception about online learning | 250 | 1 | 5 | 2.75 | 0.18 | 0.39 |
Grades | 250 | 15 | 95 | 72.56 | 2.08 | 9.52 |
Number of hours studied per week | 250 | 0 | 120 | 25 | 3.89 | 7.22 |
Frequency distributions are tables that summarize the distribution of variables by reporting the number of cases contained in each category of the variable. Frequency distributions are best used to represent nominal and ordinal variables but typically not continuous variables interval and ratio variables because of the potentially large number of categories. APA has specific guidelines for presenting tables (including frequency tables, correlation tables, factor analysis tables, analysis of variance tables, and regression tables), see the following box.
Box 10.5 – Presenting Tables in APA
Tables presented in APA are required to follow the APA guidelines outlined in the following link.
Source: APA. (2021). Table Setup. American Psychological Association. https://apastyle.apa.org/style-grammar-guidelines/tables-figures/tables
Measures of central tendency & Dispersion
Measures of central tendency are values describe a set of data by identifying the central positions within it. These include mean, mode, media, point estimate, skewness and confidence interval. Measures of dispersion tell how spread out a variable’s values are. There are four key measures of dispersion: range, variance, standard deviation and skewness. In your paper, you will typically report on N (number of cases), SD (standard deviation, M (mean).
Consider the output from SPSS as presented in Box 10.4.3.1. Note that even though the SPSS output includes all the statistics that you need for central tendency, you will need to convert this table so it fits APA standards (see Box 10.4.2.5 and Box 10.4.2.6). We encourage you to practice by converting Box 10.4.3.1 to APA standard for presenting descriptive statistics.
Table 10.3 - Sample Output from SPSS Showing Hypothetical Grades in a Course | |
---|---|
Descriptive Statistic | Course Grades |
N Valid | 1525 |
Missing | 30 |
Mean | 72.56 |
Median | 70.45 |
Mode | 68.00 |
Standard Deviation | 9.52 |
Variance | 43.67 |
Range | 50 |
UBC Research Commons for tutorials on how to generate and interpret measures of central tendency and discpersion in SPSS https://researchcommons.library.ubc.ca/introduction-to-spss-for-statistical-analysis/
In your paper, you are most likely going to report on N, SD and M (see Box 10.3.3.2). You would simply report the findings as follows:
“The computed measures of central tendency and dispersion were as follows: N=1525, M=72.56, SD=6.52”
You should never leave your results without interpretation. Hence, you might add a sentence such as:
“The average grade in this course is typical at the university, but the large standard deviation indicates that there was considerable variation around the mean”.
Remember, that Means (M) might not be the best measure of central tendency to report. The kind of variable dictates the best measure of central tendency. For instance, when discussing nominal variables, it is best to report the mode; for ordinal variables, it is best to report the median; and for interval/ratio variables (as in our example above), it is best to report the mean. However, if interval/ratio variables are skewed, it is best to report the median.
References
APA. (2022). Figure Setup. American Psychological Association. https://apastyle.apa.org/style-grammar-guidelines/tables-figures/figures
APA. (2021). Table Setup. American Psychological Association. https://apastyle.apa.org/style-grammar-guidelines/tables-figures/tables
Bhattacherjee, Anol. (2012). Social Science Research: Principles, Methods, and Practices Textbooks Collection. https://scholarcommons.usf.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=1002&context=oa_textbooks
Higgins, A. (2017). Using GIS in social science research. Susplace:Sustainable Place Shaping. https://www.sustainableplaceshaping.net/using-gis-in-social-scientific-research/
Tools for charting the relationships and visually representing the social links and preferences of individuals.
Representations, either in a graphical or tabular format, that displays the number of observations.