Chapter 6 Sampling, the Basis of Inference

6.1 Populations and Samples

 

Before we start, yet another word of warning: what follows is only a brief overview of the topic of sampling and types of sampling. What I offer is enough in terms of a necessary background to statistical inference — but the main learning objective here is inference, not everything there is to know about sampling methods and their intricacies. Thus, if this is the first time you encounter the concept, you would be better served to read a thorough introduction on sampling and the benefits and downsides of the different sampling methods in virtually any one of the research methods textbooks you can find as that would give a more comprehensive treatment that I do here. 

 

With that in mind, onward to the preliminaries: populations and samples.

 

In the introduction to this chapter, I asked a question: Do Canadians approve of immigration? How, do you think, can we go about answering it?

 

Presumably, the simplest way to investigate this would be to simply ask — imagine we contacted everyone and, indeed,  simply asked them whatever version of the question we have decided on (i.e., whichever way we have operationalized our variable, attitudes to immigration), noting everyone’s responses. Many governments, both historically and to this day, have employed and still employ this method for gathering information.

 

When we gather information from everyone in whom we are interested, we are doing a census. You probably know that the Government of Canada, through Statistics Canada, conducts a census of the Canadian population every five years. (You might have even filled the form yourself, if you are of age, or seen your parents do it, otherwise.) Then, can the government (or any researcher/agency for that matter) collect information about everything it might need or want through censuses, every time the information is required?

 

Theoretically, it’s an option. In practice, no way: it would be prohibitively expensive. You might find the reason prosaic, but any research is limited by the availability of resources, money and time. Asking one additional question on a questionnaire to one additional person has costs, which add up quickly the more questions and the more people are included in the study. Thus, censuses of the population are enormous undertakings reserved for collecting only really important (typically demographic) information, and are usually quite limited in scope.[1][2]

 

Given that conducting censuses for everything anyone (researches, governments, etc.) might want information on is generally impractical/unfeasible, what can be done when information about a population is needed?

 

Here is where statistics saves the day: with probability theory and inferential statistics, we can use the next best thing to a census — random-sample surveys! My job in this chapter will be to convince you that you don’t need to do a census of the population you want to study as long as you have a well-selected sample.

 

You, undoubtedly, have taken a survey at some point in your life in one form or another: a survey for which you were selected/invited or you volunteered; which included other people but definitely not everyone. In other words, unless we are discussing a census, surveys typically are administered to samples (i.e., sub-groups) of the population. However, not all surveys are created equal: those that can “substitute” for the population, as it were, rely on the just-mentioned technique of random sampling.

 

But first off, let’s establish what samples and populations really are. While it’s intuitive to think of population as the population of a country (say, 36.7 mln. Canadians), and of sample as a sub-group of that population (say, ten thousand Canadians), this is only a special case of the general terms sample and population. In research, a population is a group encompassing everyone on whom we want information, i.e. everyone (or everything) we want to study. Considering that we might not be studying people (recall that the units of analysis can be countries, organizations, etc.), we say that a population encompasses all elements under study. This means that we could have study populations such as “countries in South America”, or “hospitals and medical clinics in Toronto”, or “departments of sociology in Canadian universities”, etc.

 

As well, while the elements may be people, instead of the whole population of a country, we might be interested in studying “university students in Canada,” or “early childhood educators in British Columbia,” or “dog walkers in downtown Vancouver,” or “Telus company employees,” or “dentists in Surrey, BC,” etc. All of these examples are of populations that can be defined as such by researchers interested in them.

 

Thus, a sample is any sub-group of the population under study. For example, if I decide to study “KPU students”, my study population would be defined as “everyone registered as a student at KPU”. If I select a hundred students for my study, I would have a sample of N=100.

 

Ultimately, again, what the population for a particular study is depends on what the researcher wants to study.

 

If we go back to the Do Canadians approve of immigration? example, the population under study would be, of course, “Canadians” but we have to be very careful how we define “Canadians”: Are we interested in all Canadians, regardless of where they live/are at the moment? (I.e., do we include ex-pats, people with dual citizenship residing abroad, Canadian tourists travelling the world, etc.?) Or do we only want to study Canadians in Canada? And do we want to study permanent residents in Canada too or only people with Canadian passports?  Regardless of how we want to define our study population, it has to be precise and to have objective criteria that we follow consistently.

 

Once a researcher has decided on and defined a study population, and collecting data on all elements of that population is considered unfeasible (and, as you will eventually see, collecting data on all elements of the population might be even undesirable as its unnecessary, even if it were feasible), the researcher needs to select a sample for their study.

 

The procedure of selecting a sample is called sampling.There are two broad types of sampling, non-random and random, and the next section is devoted to that.


  1. For more information on the Canadian census program see here: https://www12.statcan.gc.ca/census-recensement/index-eng.cfm
  2. Censuses of the population are so expensive, some governments cannot afford to do them (or at least not regularly) and instead rely on survey data from samples. As well, in some places censuses can be fraught with controversies due to racial/ethnic and/or religious tensions, etc. and are therefore avoided. (REFERENCE Weeks 2015).

License

Simple Stats Tools Copyright © by Mariana Gatzeva. All Rights Reserved.

Share This Book