8 Subgroups analysis: Were additional comparisons made on segments of the study population?
Most RCT publications report on additional analyses of subgroups from the overall trial population (e.g. examining only participants with diabetes, or only those age >70 years). In theory, such analyses could uncover more individualized treatment effects; however, in practice they are much more likely to be spurious and irreproducible, and therefore misleading. Subgroup analyses are often performed (and emphasized in publications) when studies do not find a statistically significant difference in the overall study population. Therefore, subgroup analysis is often a form of data-mining.
Checklist Questions
Are statistically significant results in a subgroup being emphasized in the context of a neutral or negative trial? |
Was the subgroup analysis pre-defined? |
Was the direction of the subgroup effect pre-defined? |
Was the subgroup analysis one of a small number of hypotheses tested? |
Is the subgroup variable a characteristic measured at baseline or after randomization? |
Could treatment effect differences between subgroups be attributable to baseline imbalances? |
Is the subgroup effect statistically significant? |
Is the subgroup effect consistent within and across trials? |
[Systematic reviews/meta-analyses only] Is the effect suggested by comparisons within rather than between studies? |
Are statistically significant results in a subgroup being emphasized in the context of a neutral or negative trial?
Subgroup analyses can be used for data-mining when overall results are not statistically significantly different and may be highlighted when the primary endpoint fails to cross the threshold of statistical significance.
Was the subgroup analysis pre-defined?
Subgroup analyses that were not pre-defined in the protocol may be a form of data-mining, and are vulnerable to finding a difference by chance. Avoid making clinical decisions based on unanticipated significant subgroup differences (i.e. discovered post hoc) until they have been replicated in other studies.
This is particular concerning for continuous variables, such as age or cholesterol level, that are dichotomized via non-prespecified cutoffs (Schandelmaier S et al.). For example, LDL cholesterol could be dichotomized via numerous arbitrary cutoffs (e.g. >3.5, >4.0…). If not pre-specified, such cutoffs could be selected by whichever value showed the most impressive or statistically significant result. Such data-mining efforts are unlikely to uncover true subgroup differences.
Was the direction of the subgroup effect correctly pre-defined?
Subgroup effects that are significant but go in the direction opposite to what was hypothesized are less credible than correct predictions.
Was the subgroup analysis one of a small number of hypotheses tested?
More comparisons increase the likelihood of finding a difference by chance. See the multiplicity discussion here for more information.
Is the subgroup variable a characteristic measured at baseline or after randomization?
Subgroup analyses of variables measured after randomization may be affected by the interventions, thereby introducing confounding.
Examples of variables measured at baseline:
- Age
- Sex
- Pre-treatment LDL cholesterol.
Examples of variables measured after randomization:
- LDL cholesterol achieved after 12 weeks of study intervention in fixed-dose statin trial
- Success of revascularization in a trial comparing coronary artery bypass grafting surgery to percutaneous coronary intervention in coronary artery disease
Could treatment effect differences between subgroups be attributable to baseline imbalances?
Randomization ensures that confounders have equal probability of being distributed across intervention groups, but does not guarantee balance between subgroups. Subgroups are prone to imbalances of potential confounders, especially when these subgroups contain a small number of participants.
The exception is when randomization is stratified for the variable that defines the subgroups (e.g. stratified randomization by history of diabetes). In the case of a stratified subgroup, there is a reduced risk of confounder imbalance.
Is the subgroup effect statistically significant?
A review of 117 subgroup claims in 64 RCTs found that less than 40% of subgroup claims reported in the abstract were statistically significant (Wallach JD et al.).
Statistical significance is determined by examining the p-value for the test for interaction (which tests whether treatment effect differs across subgroups), not the p-value or 95% CI within a subgroup (Brookes ST et al.). “Positive” subgroup analyses that do not report the test for interaction p-value should be ignored.
Is the subgroup effect consistent within and across trials?
A true subgroup effect is also more likely if additional studies replicate the effect; however, this rarely occurs. One review found that only approximately 10% of positive subgroup analyses were replicated in a subsequent trial designed to confirm the effect within the subgroup (Wallach JD et al.).
[Systematic Reviews/Meta-Analyses Only] Is the effect suggested by comparisons within rather than between studies?
Subgroup effects identified between studies, such as in two trials in a systematic review, may be due to methodological or clinical differences between trials rather than true associations with the different subgroups
Randomized controlled trials are those in which participants are randomly allocated to two or more groups which are given different treatments.
A review that systematically identifies all potentially relevant studies on a research question. The aggregate of studies is then evaluated with respect to factors such as risk of bias of individual studies or heterogeneity among results. The qualitative combination of results is a systematic review.
A meta-analysis is a quantitative combination of the data obtained in a systematic review.
A primary outcome is an outcome from which trial design choices are based (e.g. sample size calculations). Primary outcomes are not necessarily the most important outcomes.
A multistage approach to randomization in which participants are initially allocated to strata based on certain defined commonalities (e.g. stratified according to LDL levels). After stratification these participants are then randomized within their respective stratum.