{"id":91,"date":"2021-05-22T23:28:02","date_gmt":"2021-05-23T03:28:02","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/?post_type=chapter&#038;p=91"},"modified":"2022-01-11T00:23:36","modified_gmt":"2022-01-11T05:23:36","slug":"truncated-studies-was-the-trial-stopped-early-for-overwhelming-evidence-of-benefit-or-futility","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/chapter\/truncated-studies-was-the-trial-stopped-early-for-overwhelming-evidence-of-benefit-or-futility\/","title":{"raw":"Truncated studies: Was the trial stopped early for \u201coverwhelming\u201d evidence of benefit or futility?","rendered":"Truncated studies: Was the trial stopped early for \u201coverwhelming\u201d evidence of benefit or futility?"},"content":{"raw":"Studies may be stopped early for efficacy as part of an ethical obligation to not expose participants to less effective treatment (or placebo) any longer than necessary. In other words, once it is sufficiently clear that an intervention is efficacious, there is reason to end the trial.\r\n\r\nHowever stopping early runs the risk of overestimating the effect size of the intervention. The estimate of effect will randomly vary around the true effect over time (with more fluctuation with fewer events early in the trial), so interim looks may lead to premature stop due an exaggerated estimate of the true effect size.\r\n\r\nConsider the following simulated trial where there is no true difference between the groups (i.e. [pb_glossary id=\"109\"]RR[\/pb_glossary] = 1.0):\r\n\r\n[caption id=\"attachment_781\" align=\"alignnone\" width=\"1024\"]<img class=\"wp-image-781 size-large\" src=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-1024x531.png\" alt=\"\" width=\"1024\" height=\"531\" \/> Graph 2. Relative risk vs. number of events in a simulated trial. Created via Microsoft Excel using the RAND function to generate randomized event-data for two groups.[\/caption]\r\n\r\nAs depicted in Graph 2 above, there is random deviation from the true effect as events accumulate. If the trial had interim analyses for benefit every 100 events, and the threshold for statistical significance was kept at\u00a0 the standard <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\" target=\"_blank\" rel=\"noopener\">p<\/a>&lt;0.05 without accounting for interim looks, then the trial may have stopped at 100 events when the [pb_glossary id=\"109\"]RR[\/pb_glossary] was 1.3, which we know to be an exaggeration of the true effect ([pb_glossary id=\"109\"]RR[\/pb_glossary] = 1.0, i.e. no effect).\r\n\r\nAs a simplified example, imagine studying a chess player and trying to assess if they are an above-average player (and by what margin) by judging their win percentage. One approach is to wait 50 matches, then assess their win percentage and judge accordingly. However, this could waste time as it might be unnecessary to wait that long if they are quite skilled (e.g. winning 90% of their first 10 games). So instead there could be an assessment of skill every 5 matches (up to a maximum of 50 matches). If they seem sufficiently impressive at one of these midpoint assessments, then the observation could be stopped. While this might save time, it also has a risk: if by pure chance the player goes on a win streak, then the observation is likely to end early. Even if our player is truly above-average in skill, an early stop is most likely to occur when they are on such a hot streak, consequently introducing [pb_glossary id=\"193\"]bias[\/pb_glossary] into our assessment (e.g. assessing their win probability to be 80% due to the win streak, when in fact it is only 60%).\r\n\r\nThis is the major concern with stopping rules: there is a systematic tendency for an early stop to be an overestimation. While such precautions cannot prevent [pb_glossary id=\"193\"]bias[\/pb_glossary] towards overestimation, they can help reduce the extent of this [pb_glossary id=\"193\"]bias[\/pb_glossary], as discussed below.\r\n<h1>Checklist Questions<\/h1>\r\n<table class=\"grid\" style=\"border-collapse: collapse;width: 100%;height: 54px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 18px\">\r\n<td style=\"width: 100%;height: 18px\">Was there a predefined interim analysis plan with a stopping rule?<\/td>\r\n<\/tr>\r\n<tr style=\"height: 18px\">\r\n<td style=\"width: 100%;height: 18px\">Did the stopping rule involve few interim looks and a stringent <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> (e.g. &lt;0.001)?<\/td>\r\n<\/tr>\r\n<tr style=\"height: 18px\">\r\n<td style=\"width: 100%;height: 18px\">Did enough endpoint events occur?<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h1>Was there a predefined interim analysis plan with a stopping rule?<\/h1>\r\n<div>If there is no pre-planned stopping rule then there is no assurance that sufficient safeguards were in place to minimize [pb_glossary id=\"193\"]bias[\/pb_glossary] from early stops.<\/div>\r\n<div class=\"textbox shaded\"><em><span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2 SCXW241352210 BCX9 DefaultHighlightTransition\">E.g.<\/span><span class=\"NormalTextRun SCXW241352210 BCX9\"> In JUPITER (<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Ridker PM et al.<\/a>), a [pb_glossary id=\"704\"]RCT[\/pb_glossary] <\/span><\/span><span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW241352210 BCX9\">of rosuvastatin vs. placebo in a <\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2 SCXW241352210 BCX9\">highly-selected<\/span><span class=\"NormalTextRun SCXW241352210 BCX9\"> primary cardiovascular prevention population, the pre-planned stopping rule was mentioned, though poorly described, in an early report: \u201c<\/span><\/span><span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><span class=\"NormalTextRun SCXW241352210 BCX9\">Frequency of interim efficacy analyses and rules for early trial termination have been prespecified and approved by all members of this board.\u201d<\/span><\/span><span class=\"EOP SCXW241352210 BCX9\" data-ccp-props=\"{}\">\u00a0<\/span><\/em><\/div>\r\n<h1><span class=\"TextRun SCXW199549574 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW199549574 BCX9\">Did the\u00a0<\/span><\/span><span class=\"TextRun SCXW199549574 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW199549574 BCX9\">stopping\u00a0<\/span><\/span><span class=\"TextRun SCXW199549574 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW199549574 BCX9\">rule involve few interim looks and a stringent p-value (<\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2 SCXW199549574 BCX9\">e.g.<\/span><span class=\"NormalTextRun SCXW199549574 BCX9\">\u00a0&lt;0.001)?<\/span><\/span><span class=\"EOP SCXW199549574 BCX9\" data-ccp-props=\"{}\">\u00a0<\/span><\/h1>\r\n<div style=\"font-weight: 400\">\r\n\r\nAs the number of interim looks increases, then the probability of finding a false positive or overestimation also increases. This can be mitigated by (1) minimizing the number of interim looks and (2) having a stricter threshold for statistical significance that accounts for these multiple interim analyses.\r\n\r\nSome common interim analysis strategies used (<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Schulz KF et al.<\/a>) are:\r\n<strong>Pocock<\/strong>: To keep the overall trial <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> threshold (alpha) = 0.05, the number of interim analyses are pre-defined &amp; all have the same adjusted statistical significance threshold (i.e. p&lt;0.029 for 2 planned analyses, p&lt;0.016 for 5 planned analyses, and so forth).\r\n<strong>Peto<\/strong>: Assign the final analysis <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> threshold = 0.05 (like in a conventional trial), but have a more stringent threshold (i.e. <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">p<\/a>&lt;0.001) for the interim analyses.\r\n<strong>O'Brien-Fleming<\/strong>: Begin with stringent interim analyses that start conservatively and then successively ease as they approach the final analysis (e.g. for 3 interim analyses &amp; a final analysis, sequence of <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> thresholds 0.0001, 0.004, 0.019, 0.043)\r\n<strong>Lan-DeMets:<\/strong> An adaptable approach where the significance level changes and analysis timing changes in accordance to previously observed information.\r\n\r\n<\/div>\r\n<div style=\"font-weight: 400\">\r\n<div class=\"textbox shaded\"><em>E.g.\u00a0JUPITER <span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW241352210 BCX9\">(<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Ridker PM et al.<\/a>) <\/span><\/span>was stopped after the first of two interim analyses using \u201cO\u2019Brien-Fleming stopping boundaries determined by means of the Lan-DeMets approach,\u201d (which requires a <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> &lt;0.005). The actual <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> for the [pb_glossary id=\"1517\"]primary endpoint[\/pb_glossary] was &lt;0.00001.\u00a0<\/em><\/div>\r\n<\/div>\r\n<h1><span class=\"TextRun SCXW169793194 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW169793194 BCX9\">Did enough endpoint events occur?<\/span><\/span><span class=\"EOP SCXW169793194 BCX9\" data-ccp-props=\"{}\">\u00a0<\/span><\/h1>\r\n<div style=\"font-weight: 400\">\r\n\r\nTrials stopped early for benefit exaggerate the [pb_glossary id=\"119\"]relative effect[\/pb_glossary] of an intervention by an average 29% compared with trials that conclude as planned (<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Bassler D et al.<\/a>). <span style=\"text-align: initial;font-size: 14pt\">As events accumulate, the fluctuations in effect size measures will become smaller and there will be less risk of [pb_glossary id=\"193\"]bias[\/pb_glossary] (see graph above). Optimally \u2265500 events (<\/span><a style=\"text-align: initial;font-size: 14pt\" href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Bassler D et al.<\/a><span style=\"text-align: initial;font-size: 14pt\">) should occur before stopping, after which the exaggeration decreases to an average of 12%.<\/span>\r\n\r\n<\/div>\r\n<div style=\"font-weight: 400\">\r\n\r\nFor these reasons, skepticism is warranted for any [pb_glossary id=\"1111\"]relative risk reduction (RRR)[\/pb_glossary] \u226550% generated in truncated trials with &lt;100 events (<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Pocock SJ et al., Montori VM et al.<\/a>). The larger the number of events and the more plausible the [pb_glossary id=\"1111\"]RRR[\/pb_glossary] (e.g. ~20-30% is typical for the impact of cardiovascular pharmacotherapy on cardiovascular events), the more believable the results.\r\n\r\n<\/div>\r\n<div style=\"font-weight: 400\">\r\n<div class=\"textbox shaded\"><em>E.g. In JUPITER <span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW241352210 BCX9\">(<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Ridker PM et al.<\/a>)<\/span><\/span>, 393 [pb_glossary id=\"290\"]primary[\/pb_glossary] ([pb_glossary id=\"359\"]composite[\/pb_glossary]) endpoint events occurred between the two groups by the interim analysis. The [pb_glossary id=\"1111\"]RRR[\/pb_glossary] for the [pb_glossary id=\"1517\"]primary endpoint[\/pb_glossary] was 44%, and the [pb_glossary id=\"1111\"]RRRs[\/pb_glossary] for individual components ranged from 18-54%.\u00a0<\/em><\/div>\r\n<\/div>","rendered":"<p>Studies may be stopped early for efficacy as part of an ethical obligation to not expose participants to less effective treatment (or placebo) any longer than necessary. In other words, once it is sufficiently clear that an intervention is efficacious, there is reason to end the trial.<\/p>\n<p>However stopping early runs the risk of overestimating the effect size of the intervention. The estimate of effect will randomly vary around the true effect over time (with more fluctuation with fewer events early in the trial), so interim looks may lead to premature stop due an exaggerated estimate of the true effect size.<\/p>\n<p>Consider the following simulated trial where there is no true difference between the groups (i.e. <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_109\">RR<\/a> = 1.0):<\/p>\n<figure id=\"attachment_781\" aria-describedby=\"caption-attachment-781\" style=\"width: 1024px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-781 size-large\" src=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-1024x531.png\" alt=\"\" width=\"1024\" height=\"531\" srcset=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-1024x531.png 1024w, https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-300x156.png 300w, https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-768x399.png 768w, https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-1536x797.png 1536w, https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-2048x1063.png 2048w, https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-65x34.png 65w, https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-225x117.png 225w, https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-content\/uploads\/sites\/1246\/2021\/05\/True-Effect-Graph-350x182.png 350w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption id=\"caption-attachment-781\" class=\"wp-caption-text\">Graph 2. Relative risk vs. number of events in a simulated trial. Created via Microsoft Excel using the RAND function to generate randomized event-data for two groups.<\/figcaption><\/figure>\n<p>As depicted in Graph 2 above, there is random deviation from the true effect as events accumulate. If the trial had interim analyses for benefit every 100 events, and the threshold for statistical significance was kept at\u00a0 the standard <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\" target=\"_blank\" rel=\"noopener\">p<\/a>&lt;0.05 without accounting for interim looks, then the trial may have stopped at 100 events when the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_109\">RR<\/a> was 1.3, which we know to be an exaggeration of the true effect (<a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_109\">RR<\/a> = 1.0, i.e. no effect).<\/p>\n<p>As a simplified example, imagine studying a chess player and trying to assess if they are an above-average player (and by what margin) by judging their win percentage. One approach is to wait 50 matches, then assess their win percentage and judge accordingly. However, this could waste time as it might be unnecessary to wait that long if they are quite skilled (e.g. winning 90% of their first 10 games). So instead there could be an assessment of skill every 5 matches (up to a maximum of 50 matches). If they seem sufficiently impressive at one of these midpoint assessments, then the observation could be stopped. While this might save time, it also has a risk: if by pure chance the player goes on a win streak, then the observation is likely to end early. Even if our player is truly above-average in skill, an early stop is most likely to occur when they are on such a hot streak, consequently introducing <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_193\">bias<\/a> into our assessment (e.g. assessing their win probability to be 80% due to the win streak, when in fact it is only 60%).<\/p>\n<p>This is the major concern with stopping rules: there is a systematic tendency for an early stop to be an overestimation. While such precautions cannot prevent <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_193\">bias<\/a> towards overestimation, they can help reduce the extent of this <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_193\">bias<\/a>, as discussed below.<\/p>\n<h1>Checklist Questions<\/h1>\n<table class=\"grid\" style=\"border-collapse: collapse;width: 100%;height: 54px\">\n<tbody>\n<tr style=\"height: 18px\">\n<td style=\"width: 100%;height: 18px\">Was there a predefined interim analysis plan with a stopping rule?<\/td>\n<\/tr>\n<tr style=\"height: 18px\">\n<td style=\"width: 100%;height: 18px\">Did the stopping rule involve few interim looks and a stringent <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> (e.g. &lt;0.001)?<\/td>\n<\/tr>\n<tr style=\"height: 18px\">\n<td style=\"width: 100%;height: 18px\">Did enough endpoint events occur?<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h1>Was there a predefined interim analysis plan with a stopping rule?<\/h1>\n<div>If there is no pre-planned stopping rule then there is no assurance that sufficient safeguards were in place to minimize <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_193\">bias<\/a> from early stops.<\/div>\n<div class=\"textbox shaded\"><em><span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2 SCXW241352210 BCX9 DefaultHighlightTransition\">E.g.<\/span><span class=\"NormalTextRun SCXW241352210 BCX9\"> In JUPITER (<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Ridker PM et al.<\/a>), a <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_704\">RCT<\/a> <\/span><\/span><span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW241352210 BCX9\">of rosuvastatin vs. placebo in a <\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2 SCXW241352210 BCX9\">highly-selected<\/span><span class=\"NormalTextRun SCXW241352210 BCX9\"> primary cardiovascular prevention population, the pre-planned stopping rule was mentioned, though poorly described, in an early report: \u201c<\/span><\/span><span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><span class=\"NormalTextRun SCXW241352210 BCX9\">Frequency of interim efficacy analyses and rules for early trial termination have been prespecified and approved by all members of this board.\u201d<\/span><\/span><span class=\"EOP SCXW241352210 BCX9\" data-ccp-props=\"{}\">\u00a0<\/span><\/em><\/div>\n<h1><span class=\"TextRun SCXW199549574 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW199549574 BCX9\">Did the\u00a0<\/span><\/span><span class=\"TextRun SCXW199549574 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW199549574 BCX9\">stopping\u00a0<\/span><\/span><span class=\"TextRun SCXW199549574 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW199549574 BCX9\">rule involve few interim looks and a stringent p-value (<\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2 SCXW199549574 BCX9\">e.g.<\/span><span class=\"NormalTextRun SCXW199549574 BCX9\">\u00a0&lt;0.001)?<\/span><\/span><span class=\"EOP SCXW199549574 BCX9\" data-ccp-props=\"{}\">\u00a0<\/span><\/h1>\n<div style=\"font-weight: 400\">\n<p>As the number of interim looks increases, then the probability of finding a false positive or overestimation also increases. This can be mitigated by (1) minimizing the number of interim looks and (2) having a stricter threshold for statistical significance that accounts for these multiple interim analyses.<\/p>\n<p>Some common interim analysis strategies used (<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Schulz KF et al.<\/a>) are:<br \/>\n<strong>Pocock<\/strong>: To keep the overall trial <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> threshold (alpha) = 0.05, the number of interim analyses are pre-defined &amp; all have the same adjusted statistical significance threshold (i.e. p&lt;0.029 for 2 planned analyses, p&lt;0.016 for 5 planned analyses, and so forth).<br \/>\n<strong>Peto<\/strong>: Assign the final analysis <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> threshold = 0.05 (like in a conventional trial), but have a more stringent threshold (i.e. <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">p<\/a>&lt;0.001) for the interim analyses.<br \/>\n<strong>O&#8217;Brien-Fleming<\/strong>: Begin with stringent interim analyses that start conservatively and then successively ease as they approach the final analysis (e.g. for 3 interim analyses &amp; a final analysis, sequence of <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> thresholds 0.0001, 0.004, 0.019, 0.043)<br \/>\n<strong>Lan-DeMets:<\/strong> An adaptable approach where the significance level changes and analysis timing changes in accordance to previously observed information.<\/p>\n<\/div>\n<div style=\"font-weight: 400\">\n<div class=\"textbox shaded\"><em>E.g.\u00a0JUPITER <span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW241352210 BCX9\">(<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Ridker PM et al.<\/a>) <\/span><\/span>was stopped after the first of two interim analyses using \u201cO\u2019Brien-Fleming stopping boundaries determined by means of the Lan-DeMets approach,\u201d (which requires a <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> &lt;0.005). The actual <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\" target=\"_blank\" rel=\"noopener\">p-value<\/a> for the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_1517\">primary endpoint<\/a> was &lt;0.00001.\u00a0<\/em><\/div>\n<\/div>\n<h1><span class=\"TextRun SCXW169793194 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW169793194 BCX9\">Did enough endpoint events occur?<\/span><\/span><span class=\"EOP SCXW169793194 BCX9\" data-ccp-props=\"{}\">\u00a0<\/span><\/h1>\n<div style=\"font-weight: 400\">\n<p>Trials stopped early for benefit exaggerate the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_119\">relative effect<\/a> of an intervention by an average 29% compared with trials that conclude as planned (<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Bassler D et al.<\/a>). <span style=\"text-align: initial;font-size: 14pt\">As events accumulate, the fluctuations in effect size measures will become smaller and there will be less risk of <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_193\">bias<\/a> (see graph above). Optimally \u2265500 events (<\/span><a style=\"text-align: initial;font-size: 14pt\" href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Bassler D et al.<\/a><span style=\"text-align: initial;font-size: 14pt\">) should occur before stopping, after which the exaggeration decreases to an average of 12%.<\/span><\/p>\n<\/div>\n<div style=\"font-weight: 400\">\n<p>For these reasons, skepticism is warranted for any <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_1111\">relative risk reduction (RRR)<\/a> \u226550% generated in truncated trials with &lt;100 events (<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Pocock SJ et al., Montori VM et al.<\/a>). The larger the number of events and the more plausible the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_1111\">RRR<\/a> (e.g. ~20-30% is typical for the impact of cardiovascular pharmacotherapy on cardiovascular events), the more believable the results.<\/p>\n<\/div>\n<div style=\"font-weight: 400\">\n<div class=\"textbox shaded\"><em>E.g. In JUPITER <span class=\"TextRun SCXW241352210 BCX9\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW241352210 BCX9\">(<a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/references\/\" target=\"_blank\" rel=\"noopener\">Ridker PM et al.<\/a>)<\/span><\/span>, 393 <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_290\">primary<\/a> (<a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_359\">composite<\/a>) endpoint events occurred between the two groups by the interim analysis. The <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_1111\">RRR<\/a> for the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_1517\">primary endpoint<\/a> was 44%, and the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_91_1111\">RRRs<\/a> for individual components ranged from 18-54%.\u00a0<\/em><\/div>\n<\/div>\n<div class=\"glossary\"><span class=\"screen-reader-text\" id=\"definition\">definition<\/span><template id=\"term_91_109\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_91_109\"><div tabindex=\"-1\"><p>Relative risk (or risk ratio) is the risk in one group relative to (divided by) risk in another group. For example, if 10% in the treatment group and 20% in the placebo group have the outcome of interest, the relative risk in the treatment group is 0.5 (10% \u00f7 20%; half) the risk in the placebo group. See <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\">here<\/a> for a more detailed discussion.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_91_193\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_91_193\"><div tabindex=\"-1\"><p>Systematic deviation of an estimate from the truth (either an overestimation or underestimation) caused by a study design or conduct feature. See the <a href=\"https:\/\/catalogofbias.org\/biases\/\">Catalog of Bias<\/a> for specific biases, explanations, and examples.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_91_704\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_91_704\"><div tabindex=\"-1\"><p>Randomized controlled trials are those in which participants are randomly allocated to two or more groups which are given different treatments.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_91_1517\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_91_1517\"><div tabindex=\"-1\"><p>A primary outcome is an outcome from which trial design choices are based (e.g. sample size calculations). Primary outcomes are not necessarily the most important outcomes.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_91_119\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_91_119\"><div tabindex=\"-1\"><p>Calculates the effect of an intervention via a fractional comparison with the comparator group (i.e. intervention group measure \u00f7 comparator group measure). Used for binary outcomes. Relative risk, odds ratio, or hazards ratio are all expressions of relative effect. For example, if the risk of developing neuropathy was 1% in the treatment group and 2% in the comparator group, then the relative risk is 0.5 (1 \u00f7 2). See the Absolute Risk Differences and Relative Measures of Effect discussion <a href=\"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/back-matter\/appendix\/\">here<\/a> for more information.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_91_1111\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_91_1111\"><div tabindex=\"-1\"><p>The difference between two relative risks (RRs). If the intervention has a RR of 70% and the comparator a risk of 100%, then the relative risk reduction is 30% (100% - 70%).<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_91_290\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_91_290\"><div tabindex=\"-1\"><p>This is the most accessible healthcare setting where generalist services are provided. For example, a family medicine clinic.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_91_359\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_91_359\"><div tabindex=\"-1\"><p>An outcome which consists of multiple component endpoints. For example, a cardiovascular composite may include stroke, myocardial infarction, and death.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><\/div>","protected":false},"author":1318,"menu_order":6,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-91","chapter","type-chapter","status-publish","hentry"],"part":3,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/pressbooks\/v2\/chapters\/91","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/wp\/v2\/users\/1318"}],"version-history":[{"count":26,"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/pressbooks\/v2\/chapters\/91\/revisions"}],"predecessor-version":[{"id":1863,"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/pressbooks\/v2\/chapters\/91\/revisions\/1863"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/pressbooks\/v2\/parts\/3"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/pressbooks\/v2\/chapters\/91\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/wp\/v2\/media?parent=91"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/pressbooks\/v2\/chapter-type?post=91"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/wp\/v2\/contributor?post=91"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/rickyturgeon\/wp-json\/wp\/v2\/license?post=91"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}