Truncated studies: Was the trial stopped early for “overwhelming” evidence of benefit or futility?

Ricky Turgeon; Blair MacDonald

7 Truncated studies: Was the trial stopped early for “overwhelming” evidence of benefit or futility?

Studies may be stopped early for efficacy as part of an ethical obligation to not expose participants to less effective treatment (or placebo) any longer than necessary. In other words, once it is sufficiently clear that an intervention is efficacious, there is reason to end the trial.

However stopping early runs the risk of overestimating the effect size of the intervention. The estimate of effect will randomly vary around the true effect over time (with more fluctuation with fewer events early in the trial), so interim looks may lead to premature stop due an exaggerated estimate of the true effect size.

Consider the following simulated trial where there is no true difference between the groups (i.e. RR = 1.0):

Graph 2. Relative risk vs. number of events in a simulated trial. Created via Microsoft Excel using the RAND function to generate randomized event-data for two groups.

As depicted in Graph 2 above, there is random deviation from the true effect as events accumulate. If the trial had interim analyses for benefit every 100 events, and the threshold for statistical significance was kept at the standard p<0.05 without accounting for interim looks, then the trial may have stopped at 100 events when the RR was 1.3, which we know to be an exaggeration of the true effect (RR = 1.0, i.e. no effect).

As a simplified example, imagine studying a chess player and trying to assess if they are an above-average player (and by what margin) by judging their win percentage. One approach is to wait 50 matches, then assess their win percentage and judge accordingly. However, this could waste time as it might be unnecessary to wait that long if they are quite skilled (e.g. winning 90% of their first 10 games). So instead there could be an assessment of skill every 5 matches (up to a maximum of 50 matches). If they seem sufficiently impressive at one of these midpoint assessments, then the observation could be stopped. While this might save time, it also has a risk: if by pure chance the player goes on a win streak, then the observation is likely to end early. Even if our player is truly above-average in skill, an early stop is most likely to occur when they are on such a hot streak, consequently introducing bias into our assessment (e.g. assessing their win probability to be 80% due to the win streak, when in fact it is only 60%).

This is the major concern with stopping rules: there is a systematic tendency for an early stop to be an overestimation. While such precautions cannot prevent bias towards overestimation, they can help reduce the extent of this bias, as discussed below.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Checklist Questions

Was there a predefined interim analysis plan with a stopping rule?

Did the stopping rule involve few interim looks and a stringent p-value (e.g. <0.001)?

Did enough endpoint events occur?

License

Share This Book