When quality improvement fails

In this weeks’ PhD journal club Darren Troeman discussed the paper “Effect of a multifaceted educational intervention for anti-infectious measures on sepsis mortality: a cluster randomized trial”. The plan was to improve compliance with guidelines, thereby reducing time before start of antimicrobial therapy (AT) which should reduce 28-day mortality. The intervention was compared to conventional medical education. Disappointingly, the trial provided more lessons for trialists than for healthcare providers.

When implementing a new education program for physicians you cannot evaluate its effects by randomizing individual patients (as if the lessons learned would be used for some but not for other patients); a clustered design is needed and the researchers randomized 40 hospitals (i.e. clusters) to the intervention (n=19) and control group (n=21). At the end, the intervention group had 2,596 and the control group had 1,587 patients.

Yet, the intervention was associated with a statistically significant higher day-28 mortality (35.1% vs 26.7%, p=0.01). But what was the “intervention”? The time till AT did not differ significantly between both groups (median values were 1.5 and 2.0 hours for intervention and control). So, if “no intervention” leads to higher mortality, there must be a difference in the two populations. And this was indeed very obvious in the baseline table. The opposite world: from “balanced baseline and different measure” to “different baseline and balanced measure”.

How can this be prevented? The risk of baseline differences is inherent to the number of subjects (patients or clusters) being randomized. In this case more hospitals would have reduced that risk, or a cross-over of the intervention (as this would balance the baseline differences). Yet, quality improvement is difficult to withdraw, excluding an effective cross-over. With “only” 40 hospitals a stepped-wedge design might have been a better choice. In that design each hospital would have an internal comparison of the before and after period, neutralizing the baseline differences between hospitals.

Killing, though, was that the intervention was unsuccessful; no decrease in the time till start of AT. The reason was also determined: “Semi-structured expert interviews with the local quality improvement team leaders at the end of the intervention blamed failure to establish an effective multidisciplinary quality improvement team. The interviewees criticized insufficient staff time dedicated to the project, lack of human resources, lack of support by other department heads and hospital leadership, and especially failure to achieve consistent involvement of emergency departments and involvement of regular wards in the quality improvement efforts.” Apparently, participating hospitals were not (yet?) ready for this intervention. And…. the 2 hours till AT in the control group doesn’t seem that bad at all. No information about the expected reduction of time to AT! Here, a pilot could have prevented a lot of frustration.

So, from this study the PhD learned that researchers should think thoroughly about the implications when choosing a certain study design and intervention, as the obvious choice may not always be appropriate. As for this study, the authors tried to draw firm conclusions from the study, which had effectively turned into an observational cohort study. However, the research question starting all this, remains to be answered.

Darren