Effect analysis and related approaches
8.4 Experimental and quasi-experimental research designs
Experimentation generally refers to the manipulation and control of the research conditions to allow for the testing of some hypothesis. However, different types exist and refer to specific research processes. Shadish et al. define an experiment as a “study in which an intervention is deliberately introduced to observe its effects” (Shadish et al., 2002, p. 12).
Experimental designs require that units or participants be randomly allocated to the experimental group (the one that receives the intervention), also called a ‘treatment group’, or to the control group (McDavid & Huse, 2019).
Quasi-experimental research, on the other hand, does not use randomization for assigning the intervention to participants. Assignment to conditions can be done by the participant or any other person. The research still has control of the conditions of the study (i.e., the conditions of the experiment, dosage, etc.) and efforts are made to make the experimental and control groups as similar as possible using methods such as matching participants in the groups.
In most experimental or quasi-experimental designs, key outcome variables are measured before the program is implemented, and then after the program has been running long enough to be considered fully implemented. The ‘before’ measure is called a pre-test, and the ‘after’ measure is a post-test. Comparisons of the pre-test to the post-test averages will show whether there was an average change in the outcome variable for the treatment group. (McDavid & Huse, 2019, p. 43).
Natural experiment research is “not really an experiment because the cause usually cannot be manipulated; [an example is] a study that contrasts a naturally occurring event such as an earthquake with a comparison condition” (Shadish et al., 2002, p. 12).
Various experimental designs exist: Pre-test/Post-test, Post-test only, and post-test but using a longitudinal perspective with measures of outcomes at different moments in time (Champagne et al., 2011a; McDavid et al., 2019; Weiss, 1998). A diversity of quasi experimental designs and non-experimental research designs also exist—such as static group comparison, before-after design, times series, and case study—each presenting distinct potential threats to internal validity and external validity. Summary tables of experimental and quasi-experimental designs and their relative strengths and potential biases are presented in McDavid et al. (2019, p. 133) and Champagne et al. (2011a, p. 185).
Experimental and quasi-experimental research use the following symbols and acronyms for describing designs:
- R: randomization
- X: treatment/intervention
- O: observation
For example, a Post-test research design would be represented this way:
X O1
A Comparative Pre-test/Post-test design with randomization would be represented as:
| R | → | O1 X O2 |
| → | O1 X O2 |
The quality of the study determines the confidence in the results and in the causal relationship between a specific intervention and the effects. The study quality depends on different elements and potential threats (Champagne et al., 2011a):
- Validity of instruments and measurements
- Internal validity
- External validity
- Validity of statistical analysis
The following section provides a brief overview the first three elements.
Validity of instruments and measurements
The validity of an instrument is its actual ability to correctly measure what it is supposed to measure (Champagne et al., 2011a, p. 168). This includes:
- Content validity: Refers to the extent to which an instrument accurately represents relevant dimensions of the concept it is intended to measure. For example, mortality rate alone would be a poor measure of overall population health.
- Criterion validity: Refers to the extent to which an instrument correlates with and accurately predicts the outcome it is intended to measure. An example includes examining whether success in undergraduate studies correlates with admissions exam scores, especially when such exams are used as a tool to predict academic success.
- Construct validity: Refers to whether the test is truly measuring the intended construct and not something else. For example, IQ tests are not necessarily a good measure of intelligence.
Content validity assesses how well the test covers all the relevant aspects of the concept it is supposed to measure, while construct validity evaluates whether the test is measuring the intended construct and not something else.
Internal validity
Internal validity corresponds to the confidence that the variations observed in the outcomes can be attributed to the intervention (Champagne et al., 2011a). Several elements can threaten internal validity (Champagne et al., 2011a, p. 182; McDavid et al., 2019; Ohlund & Yu, N/A):
- History: Something influences the outcome during the study other than the treatment.
- Selection: Participants differ in some relevant ways between the control and the experimental groups.
- Maturation: Changes such as fatigue or experience acquired during the experiment.
- Regression to the mean/Statistical regression: Extreme scores on the pretest tend to be less extreme on the post-test.
- Attrition/experimental mortality: Participants drop out—this is a threat if attrition differs between the control and experimental groups.
- Testing:Participants become familiar with the testing conditions when the same tests are applied multiple times.
- Instrumentation: Changes in measuring instruments, observers, etc., between the pretest and post-test can affect outcomes.
- Selection-based interactions: Selection can interact with other validity threats. For example, a program aiming to increase validity that is tested on two groups with different socio-economic status could introduce bias.
Table 8.1 presents internal validity threats to some quasi-experimental and experimental research designs. For each of the following research designs, “yes” indicates possible threats to internal validity.
Table 8.1 Internal Validity Threats to Some Quasi Experimental and Experimental Research Designs
| Model | History | Selection | Maturation | Statistical regression | Attrition/Mortality | Testing | Instrumentation | Selection-based interactions | |
| Quasi-Experimental Research Designs | |||||||||
| Pre-post test design | O X O | yes | yes | yes | yes | yes | yes | ||
| Static group comparison design | X O
O |
yes | yes | yes | yes | ||||
| Pre-post comparison group design | O X O
O O |
yes | yes | ||||||
| Case study design | X O | yes | yes | yes | yes | yes | |||
| Single time series design | OOO X OOO | yes | yes | yes | |||||
| Comparative time series design | OOO X OOO
OOO OOO |
yes | yes | ||||||
| Experimental Research Designs | |||||||||
| Pre-post test design with control group and randomization | R O X O
O X O |
yes | |||||||
| Post test design with control group and randomization | R X O
O |
yes |
Source: Compiled from McDavid et al. (2019) and Champagne et al. (2011a).
External validity
External validity relates to the capacity to generalize the results to different contexts beyond the context of research. It refers to the extent to which a causal relationship “holds over variations in persons, settings, treatments and outcomes” (Shadish et al., 2002, p. 86). They can be summarized into five categories (see Exhibit 8.1).
Exhibit 8.1: Threats to external validity: Reasons why inferences about how study results would hold over variations in persons, settings, treatments, and outcomes may be incorrect
Interaction of the Causal Relationship with Units: An effect found with certain kinds of units might not hold if other kinds of units had been studied.
Interaction of the Causal Relationship over Treatment Variations: An effect found with one treatment variation might not hold with other variations of treatment, or when that treatment is combined with other treatments, or when only part of that treatment is used.
Interactions of the Causal Relationships with Outcomes: An effect found on one kind of outcome observation may not hold if other outcome observations were used.
Interaction of the Causal Relationship with Settings: An effect found in one kind of setting may not hold if other kinds of settings were to be used.
Context-Dependent Mediation: An explanatory mediator of a causal relationship in one context may not mediate in another context.
Source: Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Company: 87.
In experimental research, several biases can potentially prevent the extrapolation of results. Champagne et al. (2011a) name the following:
- Contagion/diffusion of treatment: Communication or interaction between the control and experimental groups that compromises their independence and potentially influences outcomes.
- Compensatory reactions: The control group changes its behavior upon realizing it will not receive the treatment.
- Compensatory interventions: Professionals attempt to compensate for the absence of treatment in the control group.
- Pleasing the evaluator or expectations of the researcher: Participants may change their behavior due to a desire to please the evaluator or because of perceived expectations from the researcher.
See Champagne et al. (2011a, p. 183) for a more extensive list.
Each experimental research design presents strengths and threats to validity. Fully randomized designs are intended to prioritize internal validity and, hence, meet the conditions to determine whether there is a causal relationship between the treatment and the outcome variable. To choose the most promising design for a specific context, further consultation of specialized resources is recommended, including works by, such as Weiss (1998), Shadish et al. (2002), McDavid et al. (2019), (Brousselle et al., 2011b).
Experimental designs are essential in some contexts. For example, before commercializing new pharmaceuticals, it is important to make sure they are effective and safe and that they have been appropriately tested. New drugs are tested through rigorous protocols, first in laboratory conditions, then on humans using randomization among participants who share similar characteristics to isolate the production of effects and to gain confidence that the observed effects are related to the intervention. At this stage, the limit of generalizability of results has not been resolved. Even if such settings provide high confidence in the causality relations, they still limit the capacity to generalize results.
Similarly, randomized controlled trials (RCT) are limited in participant diversity. For example, the first vaccines for COVID-19 were tested on children only after having been tested on adults, which delayed vaccine accessibility to this population. In the past, some RCTs were subject of controversy. For example, during early HIV/AIDS drug testing, some participants reportedly shared doses with the hope that it would give them a chance to extend their lives. Finally, experimental research is not suitable in all contexts. It is not always possible to create experimentation conditions that isolate the causal relations between the intervention and the effects. Consequently, evaluators developed and used alternative designs for effect analysis, such as Contribution Analysis and those listed in the section on Impact Evaluation.