11 Results of the systematic review
The quality of the systematic review depends both on the quality of the individual studies and the aggregate characteristics of these studies. If the aggregate results are missing studies, contain predominantly poorly conducted studies, or are highly heterogeneous then this will likely warrant lower confidence in the results.
Checklist Questions
Do all inclusions & exclusions of trials make sense? |
Are you aware of any relevant studies that were not identified/included in this review? |
Did reviewers adequately assess individual trials for risk of bias? |
Was each component reported separately, or summarized with a composite quality score? |
Are there any differences between studies that should preclude meta-analysis? |
Risk of bias within trials (internal validity): Did reviewers adequately assess for (& report) risk of bias?
Risk of bias should be evaluated by using a tool that is specific to RCTs. The Cochrane risk of bias tool (version 1 (Higgins JPT et al. 2011) or 2 (Sterne JAC et al. 2019)) evaluates the risk of individual trial biases and offers the most transparent assessment of trial internal validity (see NERDCAT-RCT for more information regarding internal validity). ROBIS-I (Sterne JA et al.) is a similar tool available for appraising risk of bias in observational trials.
Quality Scores
“Quality scores” such as the Jadad score are more closely related to reporting quality than methodological issues, and lead to wide variability in conclusions on “quality” based on the score used. In particular, the Jadad score is considered obsolete and is a poor measure of risk of bias.
Methodological & clinical heterogeneity: Is it appropriate to perform a meta-analysis?
- Methodological heterogeneity: Are there methodological differences (e.g. risk of bias) between studies?
- Clinical heterogeneity: Are there any differences in clinical characteristics between the individual trials (i.e. any component of PICO) that preclude pooling the trials together in a meta-analysis?
- Is the impact of any of these characteristics tested in a subgroup analysis or meta-regression?
Testing possible sources of heterogeneity may identify causes for statistical heterogeneity identified in the meta-analysis (e.g. the intervention may only appear beneficial in trials at high risk of bias, but not in those at low risk).
See NERDCAT-RCT to learn more on how to appraise validity of subgroup effects.
A review that systematically identifies all potentially relevant studies on a research question. The aggregate of studies is then evaluated with respect to factors such as risk of bias of individual studies or heterogeneity among results. The qualitative combination of results is a systematic review.
Refers to variability between studies in a systematic review. It can refer to clinical differences, methodological differences, or variable results between studies. Heterogeneity occurs on a continuum and, in the case of heterogeneity amongst results, can be expressed numerically via measures of statistical heterogeneity. See here for a further discussion of statistical heterogeneity.
Systematic deviation of an estimate from the truth (either an overestimation or underestimation) caused by a study design or conduct feature. See the Catalog of Bias for specific biases, explanations, and examples.
A meta-analysis is a quantitative combination of the data obtained in a systematic review.
Randomized controlled trials are those in which participants are randomly allocated to two or more groups which are given different treatments.
The extent to which the study results are attributable to the intervention and not to bias. If internal validity is high, there is high confidence that the results are due to the effects of treatment (with low internal validity entailing low confidence).
An acronym for "patient, intervention, comparator, and outcome". These are the four basic elements of a study. For instance, a study may examine an elderly population (P) to understand the effects of statin therapy (I) compared to placebo (C) in terms of cardiovascular events (O). Sometimes extended to PICO(T) to include the time at which outcomes were assessed, or (D)PICO to incorporate the study design.