1 Do the results (not) apply to my patients?
Generalizability is often understood in terms of PICO, which is an acronym for “patient, intervention, comparator, and outcome”. These are the four basic elements of a study. For instance, a study may examine an elderly population (P) to understand the effects of statin therapy (I) compared to placebo (C) in terms of cardiovascular events (O). The following questions are intended to comprehensively address each of these elements.
Most considerations of generalizability are independent of study type. So, unless explicitly noted otherwise, the following questions are applicable to both randomized controlled trials and systematic reviews/meta-analyses.
Checklist Questions
How does my practice setting differ from that in the trials? |
How do my patients differ from those included in the trial? |
How do the trial interventions differ from those available in my practice? |
Are the trial outcomes clinically important? |
Does the trial reflect my patients’ risk of adverse events? What differences exist? |
[Randomized controlled trials only] Did the study design have a pre-randomization run-in period? |
[Systematic reviews/meta-analyses only] Was each element of PICO (i.e. patient, intervention, comparator, and outcome) sufficiently reported to assess generalizability? |
Do the differences above impede the generalizability of the study findings to my practice? |
Does my practice setting differ from that in the trials?
Setting considerations:
- Country and type of healthcare system
- Primary, secondary, or tertiary care
- Outpatient vs. inpatient
- Inpatient unit type
How do my patients differ from those included in the trial?
Patient selection considerations:
- Diagnostic methods
- Inclusion / Exclusion criteria
- Enrichment strategies
- Proportion of patients not enrolled because of exclusion criteria
- Proportion of patients declining to participate
Patient characteristic considerations:
- Age
- Sex/Gender
- Race/ethnicity
- Stage/severity of disease
- Similar underlying pathologies (e.g. patients with a history of hemorrhagic stroke vs. patients with a history of ischemic stroke)
- Comorbidities
- Past interventions (e.g. proportion of patients previously having tried at least 3 antidepressants)
- Interventions at baseline (e.g. the proportion of patients taking aspirin at baseline in a trial of a SGLT2 inhibitor vs. placebo)
- Baseline clinical characteristics (e.g. blood pressure, weight)
- Event rate in the control group
E.g. #2 PARADIGM-HF was a RCT assessing the effects of sacubitril-valsartan vs. enalapril in patients with heart failure with reduced ejection fraction (McMurray JJV et al.). For the primary outcome of cardiovascular death or heart failure (HF) hospitalization the HR was 0.80 (95% CI 0.73-0.87) in favor of sacubitril-valsartan. To be included, patients were required to have elevated natriuretic peptides, such as a NT-proBNP ≥600 pg/mL (or ≥400 pg/mL if hospitalized within the last year). This was incorporated as an enrichment criterion (and not as a therapeutic target), as a higher serum natriuretic peptides concentration is associated with greater risk of HF-related events (Oremus M et al.), thus increasing trial event rates and reducing the required sample size to detect a difference between groups. However, elevated BNP is not the only prognostic factor in HF, as patients with “low” BNP can still be at high risk of HF hospitalization and death. Consider the following three patients with similar predicted risk (~35%) for HF hospitalization or death at 5 years:
Characteristic | Patient #1 | Patient #2 | Patient #3 |
Age | 65 | 65 | 65 |
Sex | Male | Male | Male |
Ejection Fraction | 35% | 35% | 35% |
Type 2 Diabetes | No | Yes | No |
NT-proBNP (pg/mL) | 1000 | 100 | 100 |
New York Heart Association Class | 2 | 2 | 3 |
Since the RRR for this outcome with sacubitril-valsartan compared with an ACE inhibitor is the same regardless of NT-proBNP level, all 3 patients would be expected to have the same absolute benefit from sacubitril-valsartan despite patients #2 and #3 having NT-proBNP levels below trial inclusion criteria.
How do the trial interventions differ from those available in my practice?
Intervention considerations:
- Intervention used (e.g. drug, dose, formulation (if relevant), duration)
- Timing of intervention
- Monitoring frequency
- Appropriate comparator
- Co-interventions – either pharmacological or non-pharmacological (e.g. both the intervention and comparator groups receiving lifestyle counselling in a trial evaluating the effects of a medication on weight loss)
- Changes in therapeutics / diagnostics since trial publication
Are the trial outcomes clinically important?
Outcome considerations:
- Clinical relevance of surrogate outcomes
- Clinical utility of measurement scales
- Consideration of all patient-centered outcomes
- Adequate follow-up duration
- Outcome assessor (i.e. patient or clinician)
When assessing the relative importance of outcomes and whether all important outcomes were evaluated it can be useful to construct a hierarchy of outcomes. These are specific to the clinical circumstance and patient preference, but the following is one example of a hierarchical ranking of outcomes:
1) Death or quality of life, depending on the goals of therapy
2) Serious adverse events
3) Clinically-important morbidity (e.g. heart failure hospitalizations, major bleed, symptom scores), withdrawals due to adverse events
4) Total adverse events, specific adverse events
5) Surrogate markers (e.g. change in a biomarker, progression-free survival in oncology trials)
Does the trial reflect my patients risk of adverse events?
Adverse event considerations:
- Reporting of all clinically important adverse events
- Treatment discontinuations
- Trial site / clinician skill with treatment
- Exclusion of patients at elevated risk of adverse events
- Whether the duration of trial was adequate to detect adverse events of interest
[Randomized Controlled Trials Only] Did the study design have a pre-randomization run-in period?
Presence of a run-in period will require examination of the proportion of patients excluded during this phase, along with reasons for their exclusion.
Placebo run-in periods are usually used to:
- Obtain a pre-treatment baseline for clinical status (e.g. number of migraines/month in a trial of migraine prophylaxis)
- Ensure that the participants are sufficiently adherent to the assigned regimen
Active run-in periods are usually used to:
- Ensure short-term tolerability
- Ensure that the participants are sufficiently adherent to the assigned regimen
[Systematic Reviews/Meta-Analyses Only] Was each PICO element sufficiently reported to assess generalizability?
If the PICO characteristics are not reported sufficiently, or the review inclusion criteria too broad, it may not be possible to evaluate whether the results apply to a given patient or practice. As such, if the PICO elements are poorly described or excessively broad, consider looking for another systematic review with better reporting and scope.
Do the differences above impede the generalizability of the study findings to my practice?
There will almost always exist some differences between one’s practice and the PICO of the trial. Use clinical judgement to evaluate whether these differences render the study results inapplicable to your practice or to an individual patient. If there are sufficient differences, then an attempt should be made to predict the effect of these differences (i.e. greater or less efficacy/harm).
E.g. LoDoCo2 (Nidorf SM et al.) was a RCT of colchicine 0.5mg vs. placebo in patients with chronic coronary artery disease. Colchicine reduced the primary cardiovascular composite endpoint compared with placebo (HR 0.7 (95% CI 0.6 to 0.8), with an absolute difference of 1.5% at approximately 2 years.
It is uncertain if these results could translate to cardiovascular benefit in patients without coronary artery disease. Even if colchicine was efficacious in patients without coronary artery disease, the absolute difference would be anticipated to be lower due to a lower event rate, and the benefit:harm trade-off may in turn also be quite different.
Refers to the extent to which the trial results are applicable beyond the patients included in the study. Also known as external validity.
An acronym for "patient, intervention, comparator, and outcome". These are the four basic elements of a study. For instance, a study may examine an elderly population (P) to understand the effects of statin therapy (I) compared to placebo (C) in terms of cardiovascular events (O). Sometimes extended to PICO(T) to include the time at which outcomes were assessed, or (D)PICO to incorporate the study design.
Randomized controlled trials are those in which participants are randomly allocated to two or more groups which are given different treatments.
A review that systematically identifies all potentially relevant studies on a research question. The aggregate of studies is then evaluated with respect to factors such as risk of bias of individual studies or heterogeneity among results. The qualitative combination of results is a systematic review.
A meta-analysis is a quantitative combination of the data obtained in a systematic review.
A pre-randomization trial phase where all patients are assigned to active treatment, placebo, or no treatment (observation only). A run-in phase may be implemented for several reasons, including to restrict randomization only to patients who can adhere to study follow-up or treatment, or to exclude patients who cannot tolerate the intervention. Run-in periods by design select a certain subgroup of patients for enrolment, which introduces selection bias (i.e. potential issues with generalizability), which may be important in some cases. Note that this selection bias occurs prior to randomization, and therefore does not introduce differences between randomized groups (i.e. allocation bias).
This is the most accessible healthcare setting where generalist services are provided. For example, a family medicine clinic.
Healthcare services provided via specialists in settings less advanced than tertiary care. For example, an outpatient cardiology clinic.
Care provided in a specialized institutional centre. For example, neurosurgery or severe burn treatment.
A trial strategy to identify populations where the intervention will show the greatest effect. There is no singular method. One method is to enroll subjects and put them all on active treatment, then randomize only those who responded to treatment to either continue active treatment or switch to placebo (withdrawal trial). Another method is to include risk factors for the outcome of interest in the study as inclusion criteria (enrichment criteria) (e.g. recent diabetes trials assessing cardiovascular outcomes have selectively enrolled patients with established atherosclerotic cardiovascular disease (ASCVD) or multiple additional ASCVD risk factors to be included).
A primary outcome is an outcome from which trial design choices are based (e.g. sample size calculations). Primary outcomes are not necessarily the most important outcomes.
Hazard ratios are a relative measure of effect. Hazards refer to average instantaneous incidence rate at every point during the trial. This differentiates it from other measures, such as relative risk, which rely only on cumulative event rates. See here for a more detailed discussion.
The difference between two relative risks (RRs). If the intervention has a RR of 70% and the comparator a risk of 100%, then the relative risk reduction is 30% (100% - 70%).
Absolute risk difference is the risk in one group compared to (minus) the risk in another group over a specified period of time. For example, if the absolute risk of myocardial infarction over 5 years was 15% for the comparator and 10% for the intervention, then the absolute risk difference was 5% (15% - 10%) over 5 years. See here for further discussion.
These markers or outcomes act as proxies for clinical outcomes under the assumption that the proxy is sufficiently predictive of the clinical outcome. For example, LDL cholesterol lowering may be used as a surrogate marker for lowering the risk of cardiovascular events. Surrogate markers are typically used because they are more convenient to measure.
Standardized definition encompassing any adverse event that:
(1) Results in death or is life-threatening;
(2) Requires or prolongs hospitalization;
(3) Results in persistent, significant, or permanent disability or incapacity;
(4) Causes congenital malformation;
(5) Per the clinician's judgement led to an important medical event.
A measure of time to disease progression or death. This outcome is frequently used in cancer trials where disease progression is typically defined as an increase in radiographic tumor mass above a certain threshold.