Why Learning From Electronic Health Records Is So Appealing – And So Hard
The application of technology to medicine offers the promise of better, more intelligent care; yet success has proved elusive.
To better understand this, we will consider, first the broad ambition of the “learning health system,” understand the general challenges presented by electronic health records (EHRs), and then finally, consider the complexity of a topical use case: a consortia’s effort to use EHR data to advance the understanding of COVID-19.
EHRs and the Learning Health System Ideal
For years, ability to utilize information from EHRs has been regarded as the cornerstone of the “learning health system (LHS),” where the “feedback gap” I’ve described is successfully closed, providers are able to efficiently learn from their shared experiences, and the standard of care is iteratively improved. The concept, first introduced in 2007, has been celebrated worldwide, and is generally viewed as the ideal towards which care systems should aspire.
Unfortunately, as a 2016 systematic review of the literature by Norwegian researchers Andrius Budrionis and Johan Bellika somewhat inconveniently revealed, the LHS “mostly remains described in theory,” universally venerated, yet rarely implemented.
Of the many published papers addressing the topic, only a handful described “actual implementations,” according to the review’s authors, who describe their findings as “rather alarming,” observing:
“It seems like the interest in exploiting the potential of LHS is global; however, it remains expressed in words rather than action. Only 13 publications present initial results and support the impact of LHS by implementation. Many emphasize the potential of the novel paradigm for healthcare delivery; however, empirical results are lacking. LHS aims to shorten the reported 17 year timespan required to put positive research results into practice. But how long does it take to adopt LHS itself?”
As Budrionis and Bellika archly conclude, “The LHS concept is reaching a level of maturity that puts pressure on impact evaluation” – or to translate from the clever prose of academia, “less talk, more action.” While acknowledging the many theoretical challenges associated with the effort, they argue – exactly as I have emphasized – for the prioritization of concrete, palpable utility. The challenges, they write, “may be easier to solve if the actual impact is clearly visible.”
At the core of the LHS is the EHR. As I’ve discussed, decoding phenotype – and more broadly, the opportunity to capture phenotype at scale– represents a profound opportunity for clinical care and medical science that the EHR should enable, getting there isn’t easy.
The difficulty of actually using data from EHRs was highlighted in a pointed review two Columbia University bioinformaticists, George Hripcsak and David Albers, wrote in 2013.
While noting that the “national push for EHRs” makes an “unprecedented amount of clinical information available for research,” the authors explain that extracting value is remarkably hard. For instance, EHR datasets are generally fragmentary and incomplete, recording only select aspects of a patient’s interaction with a given care center; to the extent it’s even possible to reconstruct a patient’s longitudinal journey, such a “time series …is very far from the rigorous data collection normally employed in formal experiments.”
EHR data is also notoriously inaccurate, and many of the errors are systematic, rather than random – representing, for example, influence “by billing requirements and the avoidance of liability,” according to Hripcsak and Alberts. Sometimes, erroneous data are reflexively entered and subsequently copied. For instance, the authors point out, in one EHR database, “2% of patients who were missing one eye were documented in a narrative note as being PERRLA [a common acronym for a normal bilateral eye exam] – an impossibility.”
The authors also cite the complexity of healthcare, and note the challenge of discerning clinical reasoning from computable EHR data. Moreover, “healthcare data reflect a complex set of processes with many feedback loops,” with the ultimate effect that “the EHR is not a direct reflection of the patient and physiology, but a reflection of the recording process inherent in healthcare with noise and feedback loops.”
(To this point: Harvard researcher Griffin Weber and colleagues actually demonstrated in 2018 that the presence and timing of many laboratory test orders – rather than the results of these tests – were actually more accurate in predicting survival, highlighting the importance of understanding the care processes associated with data in a given EHR.)
Conclude Hripcsak and Albers, “the full challenge of phenotyping is not broadly recognized,” and note that while “interoperability and privacy” are often cited as key challenges, the more difficult problems, arguably, involve the actual, foundational data of which the EHR is comprised.
A Timely Example: EHR Data For COVID-19
A just-published research paper nicely captures both the hopes and challenges of leveraging hospital medical records for evidence generation.
The 4CE consortium is an international group of nearly a hundred hospitals, across five countries, that formed in response to COVID-19, with the goal of leveraging electronic health record (EHR) information to provide timely clinical and public health information about the pandemic. Most of the group were already participating in an ongoing collaborative effort around EHR use built around the “i2b2” platform, and thus were poised to turn their collective attention to the coronavirus challenge.
Data sharing consortia tend to use one of two approaches: either de-identified data are transferred to a common site and analyzed there collectively (this is also the approach used in traditional, multi-site randomized clinical trials), or the analysis is initially performed locally, in a distributed fashion, using agreed-upon protocols, and the results are then aggregated. 4CE utilizes the distributed model, which individual hospitals often view as offering greater protection to their EHR data.
The just-published research sought to evaluate the disease course and outcomes of hospitalized COVID-19 patients who were severely ill – i.e. required admission to an intensive care unit and/or who died. Yet, such seemingly basic information was “not readily available in all environments,” the investigators report – meaning that at many hospitals, it was not possible to reliably extract these data from the EHR. For example, especially during the early crush of COVID-19, many traditional medical floors (and on occasion, even hallways) were converted into ad-hoc ICUs, so “standardized EHR data elements such as ‘transfer to ICU,’” the authors note, often “would not be properly recorded.”
To get around this, the researchers devised a way to triangulate severity based on a combination of medication codes, diagnosis codes, and lab test orders, and procedures codes – information accessible in the EHRs of all participating hospitals. Such an approach is called a “computable phenotype,” and is often used to assess a clinical state when there isn’t a direct measure reliably recorded in the medical record. According to the authors, “reliable mentions of diseases are rare in the clinic record, and individual diagnosis codes are mediocre predictors of the actual presence of a disease.”
Think about what this means for a moment; although it would seem that there’s nothing more fundamental in medicine than the diagnosis of disease, figuring out from the EHR whether a patient truly has a disease can be surprisingly challenging.
It gets worse. A computable phenotype turns out to be challenging to define as well, even if this is done (and ideally it should be) with input from astute clinicians. “A phenotype can make sense clinically yet have more performance due to coding anomalies and variation between sites,” the authors explain, emphasizing the importance of validating the computable phenotype.
Thus, the entire purpose of the initial study was to develop and validate a computable phenotype to assess whether or not a particular patient’s hospital course was consistent with severe disease, a formula that could be applied effectively at each individual hospital.
Ultimately, the group was able to develop a computable phenotype that had reasonably good characteristics (compared to the best available gold standard — ideally the [painstaking] process of manual review of patient charts) across all the test sites. In the process of constructing the algorithm, the researchers were struck by the differences in use of standardized lab test, diagnostic and procedure codes across the many hospitals, highlighting the challenge inherent in efforts to utilize EHR data from multiple care systems. Billing codes denoting ICU admission, the authors note, were particularly imprecise; many ICU stays were missed.
The exceptional amount of work required to robustly determine, from EHR data, a patient attribute as seemingly basic as severity of a COVID hospitalization, and the complexity associated with this undertaking, highlights the gap between the apparent richness and bounty of EHR data, and the incredible difficulty of extracting even fairly rudimentary insights.
(In this context, it is perhaps easier to appreciate why many informaticists were immediately skeptical of the improbably rich EHR-derived datasets underlying two high-profile COVID publications this summer – both of which were subsequently retracted, as I’ve discussed.)
Significantly, in a point that is often overlooked, the success the 4CE researchers were able to achieve, they note, was buttressed by local clinical expertise – specifically, doctors at the individual hospitals “who understood the vagaries of hospital coding” and helped improve “data extraction and analysis, thereby contributing to the data quality of the 4CE initiative.”
The importance of such expertise was explicitly highlighted by consortium authors in an earlier paper as well: “Most importantly,” they write, “at each site there were biomedical informatics experts who understood both the technical characteristics of the data and their clinical relevance.”
The information in electronic health records offers great promise in the care of patients and the understanding of disease, but the exceptional difficulty of fulfilling this promise is widely underappreciated.
While improved technologies will almost certainly prove useful, it’s critical to understand the often very local clinical care process around which EHR data are generated. It was Amy Abernethy’s astute recognition of the primacy of high quality, EHR-derived, clinical expert-curated datasets, for instance, that arguably created the fundamental value in Flatiron’s oncology data platform, subsequently acquired by Roche for around $2B. (Abernethy is now the Principal Deputy Commissioner of the FDA.)
The true value of EHR data lies not only, or even primarily, in the volume of data captured or computed. Rather, it’s in the ability to render, organize, and utilize these data in a fashion that captures and reflects the clinical circumstances and care processes associated with the data’s original generation.