1 Do the results (not) apply to my patients?

Generalizability is often understood in terms of PICO, which is an acronym for “patient, intervention, comparator, and outcome”. These are the four basic elements of a study. For instance, a study may examine an elderly population (P) to understand the effects of statin therapy (I) compared to placebo (C) in terms of cardiovascular events (O). The following questions are intended to comprehensively address each of these elements.

Most considerations of generalizability are independent of study type. So, unless explicitly noted otherwise, the following questions are applicable to both randomized controlled trials and systematic reviews/meta-analyses.

Checklist Questions

How does my practice setting differ from that in the trials?
How do my patients differ from those included in the trial?
How do the trial interventions differ from those available in my practice?
Are the trial outcomes clinically important?
Does the trial reflect my patients’ risk of adverse events? What differences exist?
[Randomized controlled trials only] Did the study design have a pre-randomization run-in period?
[Systematic reviews/meta-analyses only] Was each element of PICO (i.e. patient, intervention, comparator, and outcome) sufficiently reported to assess generalizability?
Do the differences above impede the generalizability of the study findings to my practice?

Does my practice setting differ from that in the trials?

Setting considerations:

  • Country and type of healthcare system
  • Primary, secondary, or tertiary care
  • Outpatient vs. inpatient
  • Inpatient unit type

How do my patients differ from those included in the trial?

Patient selection considerations:

  • Diagnostic methods
  • Inclusion / Exclusion criteria
  • Enrichment strategies
  • Proportion of patients not enrolled because of exclusion criteria
  • Proportion of patients declining to participate

Patient characteristic considerations:

  • Age
  • Sex/Gender
  • Race/ethnicity
  • Stage/severity of disease
  • Similar underlying pathologies (e.g. patients with a history of hemorrhagic stroke vs. patients with a history of ischemic stroke)
  • Comorbidities
  • Past interventions (e.g. proportion of patients previously having tried at least 3 antidepressants)
  • Interventions at baseline (e.g. the proportion of patients taking aspirin at baseline in a trial of a SGLT2 inhibitor vs. placebo)
  • Baseline clinical characteristics (e.g. blood pressure, weight)
  • Event rate in the control group
E.g. #1 PARACHUTE (Yeh RW et al.) was a parody RCT examining whether the use of parachutes, compared to empty backpacks, prevented death and major trauma when jumping from an aircraft. The study did not find a difference in outcomes between the two groups. However, a major limitation was that all participants jumped from a motionless (mean velocity 0 km/h), grounded (mean altitude 0.1 m) plane. Non-participants (declined or were ineligible) were on average moving much faster (800 km/h) and were at a much greater altitude (9146 m). Consequently, the results of this trial do not apply to the setting where a parachute may be used in practice (jumping out of an airborne plane).

E.g. #2 PARADIGM-HF was a RCT assessing the effects of sacubitril-valsartan vs. enalapril in patients with heart failure with reduced ejection fraction (McMurray JJV et al.). For the primary outcome of cardiovascular death or heart failure (HF) hospitalization the HR was 0.80 (95% CI 0.73-0.87) in favor of sacubitril-valsartan. To be included, patients were required to have elevated natriuretic peptides, such as a NT-proBNP ≥600 pg/mL (or ≥400 pg/mL if hospitalized within the last year). This was incorporated as an enrichment criterion (and not as a therapeutic target), as a higher serum natriuretic peptides concentration is associated with greater risk of HF-related events (Oremus M et al.), thus increasing trial event rates and reducing the required sample size to detect a difference between groups. However, elevated BNP is not the only prognostic factor in HF, as patients with “low” BNP can still be at high risk of HF hospitalization and death. Consider the following three patients with similar predicted risk (~35%) for HF hospitalization or death at 5 years:

Table 1. Comparison of three patients with similar projected risk of heart failure hospitalization or death.
Estimates calculated using BCN-Bio-HF calculator on hfmedchoice.com
Characteristic Patient #1 Patient #2 Patient #3
Age 65 65 65
Sex Male Male Male
Ejection Fraction 35% 35% 35%
Type 2 Diabetes No Yes No
NT-proBNP (pg/mL) 1000 100 100
New York Heart Association Class 2 2 3

Since the RRR for this outcome with sacubitril-valsartan compared with an ACE inhibitor is the same regardless of NT-proBNP level, all 3 patients would be expected to have the same absolute benefit from sacubitril-valsartan despite patients #2 and #3 having NT-proBNP levels below trial inclusion criteria.

How do the trial interventions differ from those available in my practice?

Intervention considerations:

  • Intervention used (e.g. drug, dose, formulation (if relevant), duration)
  • Timing of intervention
  • Monitoring frequency
  • Appropriate comparator
  • Co-interventions – either pharmacological or non-pharmacological (e.g. both the intervention and comparator groups receiving lifestyle counselling in a trial evaluating the effects of a medication on weight loss)
  • Changes in therapeutics / diagnostics since trial publication

Are the trial outcomes clinically important?

Outcome considerations:

  • Clinical relevance of surrogate outcomes
  • Clinical utility of measurement scales
  • Consideration of all patient-centered outcomes
  • Adequate follow-up duration
  • Outcome assessor (i.e. patient or clinician)

When assessing the relative importance of outcomes and whether all important outcomes were evaluated it can be useful to construct a hierarchy of outcomes. These are specific to the clinical circumstance and patient preference, but the following is one example of a hierarchical ranking of outcomes:
1) Death or quality of life, depending on the goals of therapy
2) Serious adverse events
3) Clinically-important morbidity (e.g. heart failure hospitalizations, major bleed, symptom scores), withdrawals due to adverse events
4) Total adverse events, specific adverse events
5) Surrogate markers (e.g. change in a biomarker, progression-free survival in oncology trials)

E.g. A systematic review and quantitative analysis (Kovic B et al.) examined the value of progression-free survival (PFS) as a surrogate endpoint for predicting health-related quality of life (HR-QoL) in cancer treatment trials. The slope of association between PFS and global HR-QoL was 0.1 (95% CI, −0.3 to 0.5), a non-statistically significant result suggesting that PFS is a poor surrogate for HR-QoL. In addition to concerns that PFS is also an unreliable predictor of overall survival, this casts doubt on the use of PFS as a predictor of patient important outcomes. Despite this, PFS remains a key endpoint of many oncology trials, and many oncology drugs are approved based on their impact on PFS without data on HR-QoL or overall survival.

Does the trial reflect my patients risk of adverse events?

Adverse event considerations:

  • Reporting of all clinically important adverse events
  • Treatment discontinuations
  • Trial site / clinician skill with treatment
  • Exclusion of patients at elevated risk of adverse events
  • Whether the duration of trial was adequate to detect adverse events of interest

[Randomized Controlled Trials Only] Did the study design have a pre-randomization run-in period?

Presence of a run-in period will require examination of the proportion of patients excluded during this phase, along with reasons for their exclusion.

Placebo run-in periods are usually used to:

  • Obtain a pre-treatment baseline for clinical status (e.g. number of migraines/month in a trial of migraine prophylaxis)
  • Ensure that the participants are sufficiently adherent to the assigned regimen

Active run-in periods are usually used to:

  • Ensure short-term tolerability
  • Ensure that the participants are sufficiently adherent to the assigned regimen
E.g. PARADIGM-HF (McMurray JJV et al.) was a RCT assessing the effects of sacubitril/valsartan vs. enalapril in patients with heart failure with reduced rejection fraction with respect the primary outcome of cardiovascular death or heart failure hospitalization. This trial featured a single-blind run-in with enalapril followed by a single-blind run-in with sacubitril-valsartan. Approximately 11% of participants were excluded during the run-ins due to adverse events. After randomization, symptomatic hypotension occurred in 14% of patients receiving sacubitril-valsartan versus 9% of patients receiving enalapril. However, these rates are among patients who were able to tolerate both enalapril and sacubitril-valsartan during the run-in periods, and are therefore likely an underestimate of the true rate of this adverse event among reduced ejection fraction patients newly starting either medication.

[Systematic Reviews/Meta-Analyses Only] Was each PICO element sufficiently reported to assess generalizability?

If the PICO characteristics are not reported sufficiently, or the review inclusion criteria too broad, it may not be possible to evaluate whether the results apply to a given patient or practice. As such, if the PICO elements are poorly described or excessively broad, consider looking for another systematic review with better reporting and scope.

E.g. A meta-analysis by Ortiz-Orendain J et al. compared antipsychotic polypharmacy vs. antipsychotic monotherapy for the treatment of schizophrenia. The trial inclusion was not restricted based on particular patient characteristics (except being limited to those ≥18 years old), illness characteristics (e.g. severity or duration), treatment setting, nor drug characteristics (e.g. drug, dose, or formulation). Furthermore the results only reported average patient age and treatment setting, with no description of other demographic features nor illness characteristics. As a result, although comprehensive in its breadth, the study included a broad set of disparate studies with heterogeneous comparisons, rendering it difficult to apply the results to practice, or to determine if these patient-specific or treatment characteristics impacted outcomes.

Do the differences above impede the generalizability of the study findings to my practice?

There will almost always exist some differences between one’s practice and the PICO of the trial. Use clinical judgement to evaluate whether these differences render the study results inapplicable to your practice or to an individual patient. If there are sufficient differences, then an attempt should be made to predict the effect of these differences (i.e. greater or less efficacy/harm).

E.g. LoDoCo2 (Nidorf SM et al.) was a RCT of colchicine 0.5mg vs. placebo in patients with chronic coronary artery disease. Colchicine reduced the primary cardiovascular composite endpoint compared with placebo  (HR 0.7 (95% CI 0.6 to 0.8), with an absolute difference of 1.5% at approximately 2 years.

It is uncertain if these results could translate to cardiovascular benefit in patients without coronary artery disease. Even if colchicine was efficacious in patients without coronary artery disease, the absolute difference would be anticipated to be lower due to a lower event rate, and the benefit:harm trade-off may in turn also be quite different.

definition

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

NERDCAT Copyright © 2022 by Ricky Turgeon is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book