The future of infection surveillance is ….. Google

If you feel that your  hospitals’ Electronic Health Record (EHR) can do more for you, read this. Not yet peer-reviewed, but still very impressive. Using all 46 billion (!) data points in the EHR from 216.221 patients in 2 hospitals they predicted (at day 1 of admission) in-hospital mortality, long length of stay and readmission, pretty accurately, and much better than existing prediction models. How? Deep learning techniques. Who are they? The paper has 35 authors, of which 32 work at Google Inc, Mountain View, California.

In infection control we want to measure what happens to our patients and, therefore, perform surveillance of healthcare-associated infections (HAI). Apart from measuring, it would be nice if we could also predict who will develop HAI, and perhaps prevent this from happening. For this we must rely on prediction rules, that need to be developed and validated. Yet, as the authors nicely explain “80% of the effort in an analytic model is preprocessing, merging, customizing, and cleaning data sets, rather than analyzing them for insights.” Furthermore, although the number of potential predictor variables in the EHR is enormous, they are considered “too big to handle”, and, therefore, we choose a very limited number of commonly-collected variables. These 35 investigators hypothesized that deep learning approaches (higher-level informatics and artificial neural networks) could incorporate the entire EHR, including free-text notes, to better predict clinical problems and outcomes than state-of-the-art traditional predictive models.

So they drained the EHR systems and confirmed their hypothesis; the area under Receiver Operating Curve for in-hospital mortality was 0.95 (95% CI 0.94-0.96) and 0.93 (95% CI 0.92-0.94) in both hospitals. It’s kind of worrying to read, if you have spent considerable time in developing conventional prediction models and diagnostic algorithms. Doing this we changed the old-fashioned “all patients manual chart-review” into semi-automatic surveillance tools. For prosthetic joint infections we were able to reduce the workload with 85%, while improving data quality and not excluding human interpretation, see. And indeed, all of the above about 80% of the time spent on data preparation is true. The Google approach for HAI could be even more efficient, might replace all human input without missing relevant outcomes, and could also predict consequences of HAI.

That’s the future, but between then and now are a few points to be addressed. This was a retrospective study, and the next step of course is prospective validation in other settings, and then we need to see if the predictions can be used to improve quality of care. The authors also admit that their methods “are computationally intensive and at present require specialized expertise to implement.” Yet, as with genomics, the availability and accessibility of machine learning is rapidly expanding.

And what do we, clinicians and infection prevention specialists, think? Again the authors: “Clinicians have historically distrusted neural network models because of their opaqueness.” True, but we usually trust the opinion coming out of the neural network of a highly respected key-opinion leader. Now, let’s see what peer review does to this manuscript, but my “black-box neural network under my skull” tells me that this approach will be an important part of the future.


2 thoughts on “The future of infection surveillance is ….. Google

  1. With the complexity and data richness of EHRs, it is the future. IPs and Hospital Epidemiologist have to begin framing their future and adapting their work with this in mind. This is a great opportunity to recreate what our chosen profession will be like and how we will continue provide our expertise in improving patient and staff safety. We should be in leading and spearheading this change, not fearing it or pushing back against it.

    Liked by 1 person

  2. Thanks for pointing out this interesting article, Marc. For the ones still practising logistic regression for predictions, another good read on the use of machine learning is by Frank Harrell, which tells us that traditional statistical approaches may still carry some value (as long as you don’t use sens and spec). If I am understanding correctly, machine learning techniques are focused on so-called classification problems, and converting these results to accurate probabilities doesn’t always go smoothly. For example, also in this Google study, some of the calibration plots look pretty awful. Logistic regression may in fact be better when well-calibrated predicted probabilities are desired (and Frank Harrell argues that this is definitely the case in clinical care). I am not saying that we shouldn’t be preparing for machine learning, but they might still have something to learn at Mountain View.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s