A Glimpse Into the Adjacent Possible: Incorporating AI Into Medical Science
The implementation of emerging technologies requires front-line users to figure out what to do with the technology – how to adapt the technology to the problems users are actively trying to solve.
The most impactful use cases often are not immediately obvious – for example, Edison envisioned the phonograph would be predominantly used to record wills.
Moreover, effective adoption typically requires more than simply substituting new technology into processes built around legacy technology. For example, when factories first started using electric generators to replace steam power, there was minimal impact on productivity. It was only when the design of the factory was reimagined by entrepreneurs like Henry Ford (a redesign enabled by electricity) that the promised gains were realized.
It’s also important to consider what success looks like. PCR, an approach to amplifying often tiny amounts of DNA, was developed by Kary Mullis, who received the 1993 Nobel Prize in Chemistry for his efforts. Adopted relatively quickly, PCR enabled advances from disease detection (eg for COVID) to molecular engineering.
Yet if you look around medical labs today, you won’t find a “Department of PCR” or a “PCR Center of Excellence.” In a sense, the lack of such exceptionalism is a measure of PCR’s success and impact. Today, PCR is organically incorporated into the way science is done. It’s a tool, like the telescope and the microscope, that can be used to enhance our exploration of nature.
Today, medical researchers are actively exploring how to utilize AI. Rather than investing the methodology with spiritual or magical properties, it is increasingly recognized as a tool — a powerful tool if applied thoughtfully — that scientists are incorporating into their study of nature.
Alphafold, for example, is a deep learning tool that offers powerful predictions of 3D chemical structures based on the underlying amino acid sequence. It is already routinely, and appreciatively, utilized by structural biologists. It’s become a powerful new addition to the armamentarium.
Now that AI in healthcare has hopefully transitioned past both the peak of inflated (and truly extravagant) expectations as well as the trough of despair, we seem to have at last arrived at the point where savvy scientists are using AI as another technique to pursue their questions.
For these researchers, AI (like PCR, like microscopy) is a valuable means, a tool used to solve a meaningful problem; AI is not (like in too many breathless early publications) an exalted end, where the use of AI is celebrated, rather than any result it enabled, the “dancing bear” phenomenon I’ve described.
A recent paper, called to my attention by my long-time colleague Dr. Anthony Philippakis, a thoughtful physician-scientist and the chief data officer at the Broad Institute, offers an inspiring example of where AI in medicine may be headed.
The research he describes (and of which he’s a co-author) was led by MGH cardiologist Dr. Patrick Ellinor, who I first met when he was a cardiology and electrophysiology fellow at MGH, during the start of my medical training.
Ellinor and his colleagues were interested in understanding the basis of aortic aneurysms, dilations of the large blood vessel that can lead to sudden death. The identification of genes associated with aortic dilation could potentially guide the development of future medicines, while also enabling the identification of patients at risk.
Previous work had identified several extremely rare alleles that, if present, unquestionably contribute to the development of aneurysms. Yet most patients who develop aneurysms don’t have any of these alleles.
Other researchers conducted a genome-wide association study (GWAS) to identify genetic variations (single nucleotide polymorphisms, or SNPs) associated with aortic abnormalities based on data meticulously measured and recorded by echocardiography technicians; a dozen or so SNPs that could potentially contribute to disease were identified.
A talented member of Ellinor’s group, Dr. James Pirruccello, had another approach in mind. Pirruccello wanted to leverage the UK BioBank, a massive collection of deep genetic and extensive phenotypic data available to researchers for analysis. For example, cardiac MRI studies were available for about 40,000 subjects. This treasure trove of phenotypic data could be paired with the genetic data associated with participants in the U.K. database.
The elegance of Pirruccello’s approach was how he extracted the data he required from the MRI images. Manual annotation of 40,000 cMRI studies (each containing about 100 images) would be prohibitively demanding and expensive. Instead, Pirruccello trained an AI algorithm to assess aortic diameter, and, amazingly, he did so using a relatively small number of manually annotated images – 116 (92 in the initial training set, 24 in the validation set).
This approach was feasible because algorithms had previously been trained to do similar tasks. While millions of labeled images are required to train the algorithm initially, you need comparatively few to adapt an established AI algorithm to perform a similar task. This is the principle of “transfer learning.”
With the algorithm in place, Pirruccello was then able to turn it loose on the 40,000 or so cMRI images. The team was essentially converting a binary variable (aortic aneurysm: yes/no) into a continuous variable (aortic diameter). That enabled a more sensitive GWAS. Indeed, just focusing on the ascending aorta, Ellinor’s team identified 82 independent genetic regions (loci) of interest, 75 of these were novel. These loci could potentially shed light on the pathophysiology of aortic aneurysm.
These SNPs were then used to generate a “polygenic risk score”– an approach that seeks to integrate the risk contributed by a number of different SNPs, as I’ve discussed here; see also here). In turn, this measurement was used to analyze nearly 400,000 UK BioBank participants to see if it might help predict aortic aneurysms.
Remarkably, subjects with a genetic risk score in the top 10% were found to be twice as likely to develop aortic aneurysms as participants in the other 90% of the population. This type of approach, in theory, could be used to identify patients at higher risk of aortic aneurysm, and presumably help guide prevention strategies, as well as help select patients for future clinical studies. The genetic data might also help identify promising therapeutic targets.
There are many lessons from this approach, including the value of large integrated genetic/phenotypic databases, the power of GWAS analyses and its potential in target identification, and the promise of polygenic risk score assessments.
But the most exciting lessons here involve the intelligent incorporation of deep learning to “parameterize phenotype,” as Philippakis explains. The idea is to elicit an important continuous variable from a collection of images.
Significantly, Ellinor’s critical GWAS analysis, integrating genetics and phenotype, didn’t involve deep learning – just comparatively staid analytics that geneticists have been doing for two decades; the approach is at this point relatively routine.
Similarly, the polygenic risk score calculation didn’t involve deep learning.
And the research certainly didn’t involve someone asking Watson, Jeopardy-style, to think hard and come up with genes involved in aortic aneurysms.
What was clever was how the researchers leveraged AI to generate the input phenotype used in the GWAS analysis.
I hope and expect we’ll see more of these types of “organic applications” of AI as the approach becomes both less exotic and more accessible, and establishes itself as a powerful enabling tool for thoughtful medical scientists.