AI in Practice

David Shaywitz
AI is here – everywhere it seems. How are we doing on translating this extraordinary promise into palpable value?
Organizations
As utopian AI “Accelerationists” have battled catastrophizing “Doomers” over competing visions of which eschaton is likely to be immanentized, a less visible but perhaps more consequential constituency has quietly focused on applying the still-evolving technology to the intractable problems of the present.
These doers — understated, practical and ridiculously stubborn — are the subject of a compelling new book by Josh Tyrangiel, AI for Good, I’ve recently reviewed in the WSJ. (Sadly, my original lede — the one that starts this section — didn’t survive the editorial scalpel.)
Tyrangiel’s key points will all sound familiar to readers of this column: AI implementation is not plug and play — it requires deliberate effort and domain expertise. Critical players include lead users on the organization size focused on actual problems to solve, as well as a capable tech partner with a high EQ translator who can inhabit the problems the organization is trying to solve and translate that into technology spec for developers.
Even with all these in place, getting new technology to do what you want is inherently challenging, as a very well-intentioned collaboration between OpenAI and Khan Academy reveals, and is often further complicated by organizational antibodies, which aspiring innovators need to manage.
Approaches to this range from racing to take advantage of the temporary loosening of bureaucratic hurdles due to a crisis (Gen Perna of Operation Warp Speed) to working under the radar like former IRS Commissioner Danny Werfel, who advanced his agency’s AI capabilities under the cover of its reputation as “slow and boring and technologically hopeless.”
Still, resistance is everywhere, and medicine earns a particular callout. “Hospitals,” Tyrangiel observes, “may have exalted missions, but they’re just as full of territorial jerks as investment banks.” Several examples from the Cleveland Clinic — sure to resonate with innovators in academic medical centers everywhere — bring the point home.
Individuals
A second AI book I reviewed in the Journal, I Am Not A Robot, by Joanna Stern, looks at how AI is, and is likely (in the near future) to impact the lives of individuals. Again, the issue isn’t so much the technology itself as how it’s used. AI to improve early cancer detection in mammography: good; AI to use colorful overlays to upsell patients on marginal dental procedures: less good. Household robots: not ready for prime time –- although Stern reports AI massage robots deliver “some great ass work.”
Minds and hearts, naturally, are of particular interest, as both learning and love involve discovery & struggle, negotiation & compromise -– and Stern counsels that we should be wary of AI bearing effortless solutions.
AI in Biopharma Discovery
The application of AI to biopharma was recently discussed with characteristic thoughtfulness by veteran Journal reporter Peter Loftus, drawing conclusions that, again, will not surprise regular readers of this column.
We should separate out some highly questionable claims, such as the suggestion that Takeda’s TYK2, acquired from Nimbus, was discovered using AI; as physician-scientist and healthtech investor Patrick Malone noted three years ago, the medicine “was discovered using a type of computer-aided drug discovery called structure-based drug design” which qualifies as AI to the extent that “AI is a catch-all term for methods that use computers to solve a task.”
What seems most clear is that the lion’s share of meaningful progress has been in operations; as one Lilly executive notes, “The reality is where we’ve seen all the benefits in AI so far is not actually in drug discovery — it’s actually in the rest of the process.”
As readers of this column recognize, this is what several pharma leaders – including the CEOs of Lilly and Novartis – have acknowledged for years.
To be sure, many of the same companies that were previously persuaded by management consultants that the secret to radically improved R&D efficiency involved building and then posing in front of an enormous data dashboard have now apparently been convinced that R&D deliverance (or at least an opportune distraction) requires constructing a ginormous supercomputer that goes up to 11 (or to 12, in the case of Roche’s, evidently).
More substantively, key challenges limiting the impact of AI in R&D, as Andreas Bender and colleagues, including Jack Scannell and me, describe in a forthcoming publication include:
Data quality and character. Biological data has three properties that limit AI’s impact: scarcity (few endpoints that truly matter vs. abundant proxy data), fragility (labels like “works” or “binds” are highly conditional, not categorical), and epistemic opacity (biological complexity limits the line of sight between proxy measures and outcomes we actually care about).
Confusing map and territory. Models that look promising in journals often function poorly in actual R&D contexts — a problem described compellingly in a recent Nature Medicine publication by Azad, Krumholz, and Saria in the related context of clinical care.
AI in Biopharma Development
There’s a fascinating new proposal by two authors familiar to TR readers: Ziad Obermeyer, an emergency room physician and health service researcher (profiled in TR here), and Katherine Baicker, a health economics scholar and Provost of the University of Chicago, also the co-author of a prominent recent study debunking a range of approaches to workplace wellness (discussed in TR here).
In their new paper, the authors start by acknowledging the urgent need for surrogate endpoints for clinical trials, essentially to accelerate knowledge turns. TR readers obviously get this and wrestle with this every day. The issue, of course, has been exactly the “epistemic opacity” discussed above –- we generally don’t understand as well as we need to the connection between what we can measure and the endpoints about which we care.
Consequently, when we suggest seemingly sensible surrogate endpoints, they often fail. Obermeyer and Baicker bring the painful receipts, captured by Obermeyer in this post (part of an excellent thread on the topic):

Instead of using our flawed causal understanding to select a candidate marker, the authors suggest that AI models trained on comprehensive, readily available physiological data (“ECGs, imaging, laboratory results, and clinical notes”) can generate a “surrogate index” –an empirical, composite, predicted probability of long-term clinical outcomes that shifts proportionally to a drug’s true treatment effect.
It’s an approach that reminds me of polygenic risk scores (this is a useful PRS explainer from Eric Topol at Ground Truths), only more dynamic –- and seems extremely promising, albeit with a number of caveats, as the authors highlight.
A particularly important challenge –- with which I’ve engaged since my very first pharma role in experimental medicine at Merck two decades ago – is that the pragmatic (vs academic) utility of a surrogate marker is having sufficient confidence in it to inflect the direction of a drug development program (a surrogate marker that actually counts as an acceptable proxy by the FDA for a traditional endpoint is an even higher bar).
Essentially, you need to be able to kill a beloved program based on the results of the surrogate marker in early studies –- and for all sorts of organizational reasons, this is brutally difficult to achieve. But if AI-derived markers earn a reputation for being sufficiently robust –- they could be embraced as the authors propose, and meaningfully accelerate development, and (beyond what the authors write) perhaps find acceptance (at least provisionally) as an endpoint acceptable to the FDA.
AI in Medicine
A recent Ground Truths post by Eric Topol describes a fascinating — and concerning — dichotomy between the inexplicably slow general adoption of task-specific deep learning, which has proven its value in healthcare with some rigor, and the remarkably rapid general adoption of generative AI, a newer and more versatile generation of deep learning that still urgently needs more rigorous validation.
The need for more robust evidence around genAI aligns with a just-published Nature Medicine editorial, “Show us the evidence for the value of medical AI,” as well as a plea from leading practitioners such as Raj Manrai of Harvard’s Department of Biomedical Informatics (my home department).
Manrai writes that we need “[p]rospective clinical trials. Health systems investing in infrastructure now. Monitoring frameworks that track not just diagnostic accuracy but safety, efficiency, and cost.” He adds, “The science has reached a point where trials are justified.”
Topol’s strongest examples of the delayed implementation of deep learning applications come from imaging. The technology can extract clinically relevant information from scans obtained for other reasons: cardiovascular risk from chest X-rays or mammograms, osteoporosis risk from both chest films and retina images(!), body composition and metabolic risk from CT or MRI, liver disease from echocardiograms, and systemic disease risk again from retinal images (as Topol points out, the retina appears to be an unusually rich source of signal detectable by AI).
This is the promise of opportunistic AI: making better use of data medicine already collects, without additional radiation or inconvenience, and potentially at minimal additional cost.
The key implementation question is how to turn that additional signal into better care. Some outputs — calibrated risk estimates, bone-density measures, body-composition phenotypes — may fit naturally into established prevention pathways. Others, especially potential cancers, nodules, or occult lesions, raise the familiar screening concerns: false positives, downstream testing, overdiagnosis, anxiety, and uncertain net benefit.
The real opportunity is to convert AI-derived signals into clinically useful intelligence, echoing Zak Kohane’s aspirational description of medicine as “fundamentally an information- and data-processing discipline.”
Adds Kohane, the Chair of Harvard’s Department of Biomedical Informatics, “How you distill data into knowledge, and how you take that knowledge and put it in practice, is at the heart of medical science today.”
Several colleagues and I have been thinking about how to make AI-enabled monitoring more context-aware and clinically useful; a related perspective is currently under peer review.
Bottom Line
Biopharma and healthcare, like the rest of the world, are actively engaged with the challenge of figuring out where and how AI can be usefully applied. As “AI for Good” shows, implementation is difficult and requires deliberate work and the right people. We should use it deliberately and strategically, rather than embrace it reflexively or mindlessly; we also shouldn’t underestimate its exceptional promise, even as we foreground the significant concerns inherent in its development. In pharma, using AI to accelerate discovery is challenging and so far has arguably been more succès d’estime –- and succès de tech VC dollars – than transformative progress. Meanwhile, AI-enabled surrogate indices represent a promising (but still unproven) approach to accelerating clinical trials. Evidence suggests the application of deep learning to medical imaging can provide important insights, value that has largely been overlooked by traditional medicine, and a market opportunity thoughtful entrepreneurs are sure to recognize.



