This weeks’ publication of the highly controversial results of the MERINO trial in JAMA caused quite a stir on social media. The paper has been viewed >50,000 times and the unexpected outcome has been challenged by many. But what was the conclusion in JAMA? “Among patients with E. coli or K. pneumoniae bloodstream infection (BSI) and ceftriaxone resistance, definitive treatment with piperacillin-tazobactam compared with meropenem did not result in a non-inferior 30-day mortality.” Not and in the same sentence, a doubled denial, is confusing. More important, as formulated, the study was inconclusive, which nobody seems to accept. We dived into the depths of the reporting and then tried to explain it.
In short, in the MERINO trial, patients with BSI caused by ceftriaxone-nonsusceptible E. coli or K. pneumoniae were after 3 days of empiric therapy randomized to intravenous pip-tazo or meropenem. In the pip-tazo group 23 of 187 patients (12.3%) died within 30 days, compared to 7 of 191 (3.7%) in the meropenem group (risk difference, 8.6% [1-sided 97.5% confidence interval (CI), −∞ to 14.5%]). With a pre-defined non-inferiority margin of 5%, the authors correctly concluded that these data do not support that pip-tazo is non-inferior to meropenem.
What is actually done in a non-inferiority trial? Before the study one decides what level of inferiority would still be clinically acceptable: in this study a 5% increased 30-day mortality with pip-tazo. After the study one gets an estimate of the difference and a CI: in this study 30-day mortality was 8.6% higher with pit-tazo with a CI that goes from 14.5% all the way down to minus infinity, thereby fully overlapping the pre-set non-inferiority margin and the zero line, even including the possibility that pip-tazo may be superior. Only conclusion possible: non-inferiority not demonstrated.
Is – in general – no evidence for non-inferiority, the same as one of both options being inferior (and the other superior)? Generally speaking: no (absence of evidence does not equal evidence of absence), but sometimes you can conclude superiority or inferiority in a non-inferiority trial.
Let’s look at the reported CI, with the lower bound going to -∞ (minus infinity). When using a two-sided confidence interval three different situations with three vastly different conclusions are possible (situation 1, 2 or 3 in the figure):
1) Results are inconclusive, as a true risk difference of either 0 or <5% is still plausible. No discussion, we need more data.
2) Statistically inferior (lower bound CI >0), but non-inferiority cannot be excluded (lower bound CI <5%), though most would consider pip-tazo as not to be recommended in clinical practice given these results and a new trial doesn’t seem ethical.
3) Statistically and clinically inferior. Clear conclusion, no discussion.
So what is the most likely situation in the MERINO trial? Based on the crude data, the 2-sided 95% CI would be 3.3% to 14.5%, implying that pip-tazo is statistically inferior to meropenem. Why is that not concluded?
When the authors first presented their preliminary results at ECCMID’s late-breaker session they presented two-sided CIs and concluded that pip-tazo was inferior to meropenem, see. So, why did they choose to report only one-sided CIs in the JAMA paper? Perhaps, it is JAMA policy to present non-inferiority trials as one-sided CI, which contrasts the CONSORT statement for non-inferiority trials (also published in JAMA) which allows 2-sided CI and the possibility for superiority and inferiority conclusions in non-inferiority trials. In fact, some state that it makes no sense to ignore evidence on superiority/inferiority if a trial produces such evidence, even if unexpected. It seems like one side of the CI got lost in the peer-reviewing/editorial process. Yet, despite the rather cryptic conclusion, we – as in the JAMA editorial – conclude that the MERINO trial demonstrates that pip-tazo is inferior to meropenem (and hence that meropenem is superior to pip-tazo).
Why is meropenem superior? This was a study of patients with relatively mild disease (98% had PITT bacteremia score <4, median 1.0), of which 67% received appropriate antibiotics within 5.5 hours after blood cultures were taken (for pip-tazo group), and of whom 40.7% had resolution of clinical symptoms at the time of randomization. The 30-day mortality of 3.7% in the meropenem group is, therefore, less surprising than the 12.3% mortality with pip-tazo. Rather than how did meropenem prevent mortality, the question is how did pip-tazo kill patients? In eTable 6 the authors provide short narratives of all fatal serious adverse events, and death following treatment failure of the infection for which pip-tazo was prescribed was not observed. Most patients died from malignancies or yeast infections, for which both antibiotics are considered ineffective.
At social media the authors already responded to this point: “We believe that inadequately treated infection (as may have been provided by pip-tazo) pushed people with significant comorbidities over the edge. While these people may have been destined to die from that underlying comorbidity, their death was hastened by suboptimal BSI treatment.” Could be, and their believe would be supported by rapid disappearance of the difference at day-60 and day-90.
We previously explained how lucky the Danes had been with demonstrating non-inferiority in an “underpowered” non-inferiority trial. Were the MERINO investigators equally unlucky, in getting a false-positive result? Well, the chance of the outcomes as observed for 2 antibiotics that are equivalent is 0.19%. Still not 0, but how low do you wanna go?
This blog was co-authored by Valentijn Schweitzer, Tim Deelen and Henri van Werkhoven and emerged (again) from our EEWMM (extreme-early Wednesday morning meeting).
Thank you for pointing out the absurdity of the JAMA guidelines on 1-sided confidence intervals. We made a similar point in a paper – Superiority and non-inferiority: two sides of the same coin? – which was recently published in Trials.
https://rdcu.be/6Zgp
LikeLike
One of the medical microbiologists in training from Breda/Tilburg kindly pointed out that one of the figure labels is incorrect. “3: statistically and clinically non-inferior” should have been “3: statistically and clinically inferior”. Quite a different interpretation. Thanks for the “peer review” Pepijn!
LikeLike