Subgroups analysis: Were additional comparisons made on segments of the study population?

Ricky Turgeon; Blair MacDonald

8 Subgroups analysis: Were additional comparisons made on segments of the study population?

Most RCT publications report on additional analyses of subgroups from the overall trial population (e.g. examining only participants with diabetes, or only those age >70 years). In theory, such analyses could uncover more individualized treatment effects; however, in practice they are much more likely to be spurious and irreproducible, and therefore misleading. Subgroup analyses are often performed (and emphasized in publications) when studies do not find a statistically significant difference in the overall study population. Therefore, subgroup analysis is often a form of data-mining.

Checklist Questions

Are statistically significant results in a subgroup being emphasized in the context of a neutral or negative trial?

Was the subgroup analysis pre-defined?

Was the direction of the subgroup effect pre-defined?

Was the subgroup analysis one of a small number of hypotheses tested?

Is the subgroup variable a characteristic measured at baseline or after randomization?

Could treatment effect differences between subgroups be attributable to baseline imbalances?

Is the subgroup effect statistically significant?

Is the subgroup effect consistent within and across trials?

[Systematic reviews/meta-analyses only] Is the effect suggested by comparisons within rather than between studies?

Are statistically significant results in a subgroup being emphasized in the context of a neutral or negative trial?

Subgroup analyses can be used for data-mining when overall results are not statistically significantly different and may be highlighted when the primary endpoint fails to cross the threshold of statistical significance.

Was the subgroup analysis pre-defined?

Subgroup analyses that were not pre-defined in the protocol may be a form of data-mining, and are vulnerable to finding a difference by chance. Avoid making clinical decisions based on unanticipated significant subgroup differences (i.e. discovered post hoc) until they have been replicated in other studies.

This is particular concerning for continuous variables, such as age or cholesterol level, that are dichotomized via non-prespecified cutoffs (Schandelmaier S et al.). For example, LDL cholesterol could be dichotomized via numerous arbitrary cutoffs (e.g. >3.5, >4.0…). If not pre-specified, such cutoffs could be selected by whichever value showed the most impressive or statistically significant result. Such data-mining efforts are unlikely to uncover true subgroup differences.

Was the direction of the subgroup effect correctly pre-defined?

Subgroup effects that are significant but go in the direction opposite to what was hypothesized are less credible than correct predictions.

Was the subgroup analysis one of a small number of hypotheses tested?

More comparisons increase the likelihood of finding a difference by chance. See the multiplicity discussion here for more information.

Is the subgroup variable a characteristic measured at baseline or after randomization?

Subgroup analyses of variables measured after randomization may be affected by the interventions, thereby introducing confounding.

Examples of variables measured at baseline:

Age
Sex
Pre-treatment LDL cholesterol.

Examples of variables measured after randomization:

LDL cholesterol achieved after 12 weeks of study intervention in fixed-dose statin trial
Success of revascularization in a trial comparing coronary artery bypass grafting surgery to percutaneous coronary intervention in coronary artery disease

Could treatment effect differences between subgroups be attributable to baseline imbalances?

Randomization ensures that confounders have equal probability of being distributed across intervention groups, but does not guarantee balance between subgroups. Subgroups are prone to imbalances of potential confounders, especially when these subgroups contain a small number of participants.

The exception is when randomization is stratified for the variable that defines the subgroups (e.g. stratified randomization by history of diabetes). In the case of a stratified subgroup, there is a reduced risk of confounder imbalance.

Is the subgroup effect statistically significant?

A review of 117 subgroup claims in 64 RCTs found that less than 40% of subgroup claims reported in the abstract were statistically significant (Wallach JD et al.).

Statistical significance is determined by examining the p-value for the test for interaction (which tests whether treatment effect differs across subgroups), not the p-value or 95% CI within a subgroup (Brookes ST et al.). “Positive” subgroup analyses that do not report the test for interaction p-value should be ignored.

E.g. In HPS (Heart Protection Study Collaborative Group), subgroup analysis based on sex (1 of 17 subgroup analyses reported) did not show a statistically significant test for interaction (p=0.18), meaning that the overall trial results applied to both males and females.

Is the subgroup effect consistent within and across trials?

Within a trial, consistent subgroup effect across multiple related outcomes (e.g. myocardial infarction, ischemic stroke and cardiovascular death) increases the credibility of there being a true subgroup effect.

E.g. Myocardial infarction, ischemic stroke, and cardiovascular death all being similarly reduced by an intervention on a subgroup of patients with diabetes.

A true subgroup effect is also more likely if additional studies replicate the effect; however, this rarely occurs. One review found that only approximately 10% of positive subgroup analyses were replicated in a subsequent trial designed to confirm the effect within the subgroup (Wallach JD et al.).

[Systematic Reviews/Meta-Analyses Only] Is the effect suggested by comparisons within rather than between studies?

Subgroup effects identified between studies, such as in two trials in a systematic review, may be due to methodological or clinical differences between trials rather than true associations with the different subgroups

E.g. The Physicians’ Health Study (Steering Committee of the Physicians’ Health Study Research Group), a study of men without previous cardiovascular disease, found that low-dose ASA statistically-significantly reduced the risk of myocardial infarction but not stroke. Many years later, the Women’s Health Study demonstrated a statistically significant reduction in stroke but not myocardial infarction with ASA in women without previous cardiovascular disease. It would be inappropriate to conclude based on an indirect comparison of these two RCTs that ASA has different benefits in men compared with women.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Checklist Questions

Are statistically significant results in a subgroup being emphasized in the context of a neutral or negative trial?

Was the subgroup analysis pre-defined?

Was the direction of the subgroup effect correctly pre-defined?

Was the subgroup analysis one of a small number of hypotheses tested?

Is the subgroup variable a characteristic measured at baseline or after randomization?

Could treatment effect differences between subgroups be attributable to baseline imbalances?

Is the subgroup effect statistically significant?

Is the subgroup effect consistent within and across trials?

[Systematic Reviews/Meta-Analyses Only] Is the effect suggested by comparisons within rather than between studies?

License

Share This Book