An early switch from IV to oral treatment is one of the pillars of antibiotic stewardship. Oral antibiotics are mostly cheaper, hospital stay shortens and thus also the risk of healthcare-associated infections. One problem: before we change our current practice, we must demonstrate that the new strategy is safe. The best evidence comes from a non-inferiority trial. Yet, that usually implies enrolment of many patients. The solution to that problem: put on your poker face when drafting your sample size calculation and hope for the best. Our Danish colleagues show how.
In the POET trial (NEJM) 400 patients with Gram-positive left-sided endocarditis, being clinically stable after at least 10 days of IV antibiotics, were randomised to continue IV (N=199) or switch to oral treatment (N=201). The primary outcome was a composite of all-cause mortality, unplanned cardiac surgery, embolic events, or relapse of bacteraemia with the primary pathogen 6 months after therapy. The trial is a major achievement; for instance, of 700 eligible patients 400 agreed to an experiment that also included 2 transesophageal echocardiography’s. The primary endpoint occurred in 24 patients (12.1%) in the IV and in 18 (9.0%) in the oral group (risk difference (RD) 3.1%, 95% confidence interval (CI) −3.4%-9.6%). Conclusion: safe to switch.
Valentijn Schweitzer and Henri van Werkhoven, who both suffered from conducting a non-inferiority trial with >2,000 patients, immediately went to the statistics. The study appeared to be powered on a non-inferiority margin of 10% with an event rate of 10%, which indeed leads to 400 patients to demonstrate non-inferiority with 90% power. If there is no difference between iv and oral treatment, the study could have resulted – with equal likelihood – in the opposite point estimate, being a risk difference of -3.1% (-9.6%-3.4%). Now look at this carefully: this would still “demonstrate” non-inferiority of the oral treatment group, although a 9.6% increase of the risk of the composite outcome (from 10% to 19.6%) would be in the 95% confidence interval. How many clinicians would consider that sufficiently safe?
So, Valentijn and Henri approached infectious disease clinicians in our hospital with the question: ‘In these patients, how much more risk for the composite outcome do you consider clinically acceptable?’ The answer was 3% (unpublished data, but from reliable people). Now, if the POET trial had used a non-inferiority margin of 3% instead of 10%, the required sample size would have been 11 times higher. But, with 400 patients (adequately powered for a 10% but severely underpowered for a 3% non-inferiority margin) the POET trial almost demonstrated non-inferiority for 3%, as the lower bound of the confidence interval was 3.4%. How is this possible?
The answer is: sheer luck or foreknowledge (i.e., knowing that oral treatment is better). If we assume that there is truly no difference between both strategies, and we would perform multiple studies, only few would yield a risk difference of exactly 0% and all other risk differences would randomly vary around 0%, half in favour and the other half not in favour of oral treatment. To quantify the amount of “sheer luck” in the POET trial, we calculated the probability of meeting non-inferiority on a 3% margin with 400 patients and truly no difference between both treatment strategies (Figure). This probability is 16%. Conversely, the probability of the study being inconclusive would be 84%. Thus, only one out of every six trials would demonstrate non-inferiority. Better not put that in your application for funding.
Naturally, the 16% would increase if one would know that oral treatment would do better. But, then it would have been unethical to continue patients on IV.
This illustrates the non-inferiority dilemma: either use a clinically meaningful non-inferiority margin and get an unfeasibly large sample size that may not get funded, or base the non-inferiority margin on a feasible sample size and hope for the best.
Figure: Chance to stay below clinically accepted non-inferiority margin in a trial with 400 subjects