The Costs of Implementing Healthcare AI
The role of oncologists involves preparing cancer patients for tough choices, such as treatment and end-of-life decisions. However, they sometimes overlook these critical discussions. At the University of Pennsylvania Health System, a predictive algorithm has been developed to nudge doctors toward these important conversations, aiming to improve patient care. This system assesses the likelihood of a patient's death, prompting discussions on treatment options.
Nevertheless, this tool is not as straightforward as it seems. A recent assessment found that the algorithm's effectiveness declined during the COVID-19 pandemic, with its accuracy in predicting mortality decreasing by 7 percentage points, as noted in a 2022 study.
This decline has serious implications, as emphasized by Ravi Parikh, an oncologist from Emory University and the study's lead author. The tool fell short of alerting physicians when to address crucial discussions, potentially leading to unnecessary chemotherapy for some patients who would have benefited from a different approach. Parikh points out that this issue may not be isolated to just the algorithm at Penn Medicine; many healthcare institutions do not consistently monitor the performance of their AI tools.
The challenges associated with algorithm malfunctions highlight a broader issue that is becoming increasingly evident to hospital executives and researchers: the necessity for ongoing monitoring and staffing of AI systems to ensure they function effectively. Essentially, human resources are required to supervise these advanced tools.
“Many believe that AI will resolve issues of access, capacity, and overall care enhancement,” says Nigam Shah, chief data scientist at Stanford Health Care. “But if using AI increases care costs by 20%, is that a feasible solution?”
Government officials have expressed concerns that hospitals might lack the ability to adequately assess these technologies. FDA Commissioner Robert Califf recently remarked, “I have searched extensively, and I do not believe any health system in the U.S. is capable of validating an AI algorithm integrated into clinical care.”
AI is already ingrained in healthcare practices. Algorithms help predict patient risks, recommend diagnoses, prioritize care, automate visit summaries, and process insurance claims.
Should tech advocates be believed, AI will not only become commonplace but also lucrative. The investment firm Bessemer Venture Partners has identified around 20 AI startups in healthcare that are projected to generate $10 million each in annual revenue. The FDA has approved nearly a thousand AI products for healthcare applications.
However, measuring the effectiveness of these AI tools poses challenges. Monitoring their ongoing performance – akin to identifying a car’s mechanical failures – is even more complicated.
A recent investigation at Yale Medicine, which assessed six “early warning systems” designed to alert clinicians when patients are likely to deteriorate, revealed significant variations in their effectiveness. The study utilized a supercomputer for several days, highlighting stark performance differences across the systems.
For hospitals, identifying the most suitable algorithms for their specific needs can be burdensome, particularly when most doctors lack access to supercomputing resources, and there is no standard available for evaluating AI tools, much like a Consumer Reports review.
“We lack standardized metrics,” states Jesse Ehrenfeld, the immediate past president of the American Medical Association. “Currently, I cannot refer to any standard for evaluating or monitoring the performance of AI-enabled models once they are deployed.”
A prevalent AI application in healthcare settings is known as ambient documentation, which serves as a tech-driven assistant that listens to patient visits and summarizes them. Last year, investors monitored a substantial investment of $353 million into companies supplying these documentation technologies. However, Ehrenfeld warns, “No standards exist right now for comparing the outputs of these tools.”
This is crucial, as even minor errors can have significant consequences. A team at Stanford University investigated the use of large language models – the technology behind AI tools like ChatGPT – to summarize individual medical histories. Their findings revealed an error rate of 35% in the best scenarios. As Shah mentions, “When you’re summarizing a medical case, omitting a key term like ‘fever’ can lead to severe issues.”
Occasionally, algorithmic failures have logical explanations. For instance, changes in data sources, like when hospitals switch laboratory providers, can diminish the effectiveness of an AI system. In other cases, problems arise without clear reasons.
Sandy Aronson, a tech executive at Mass General Brigham, discussed a trial involving software designed to assist genetic counselors in finding relevant information about DNA variants. The product displayed “nondeterminism,” meaning it produced varying results when the same question was posed multiple times in quick succession. While Aronson is optimistic about the potential of language models to aid genetic counselors, he recognizes that the technology has room for growth.
Given the lack of metrics and standards alongside the unpredictability of errors, healthcare institutions face the reality of needing to invest considerable resources into AI. At Stanford, for instance, Shah mentioned that auditing just two models for fairness and reliability took eight to ten months and required 115 man-hours.
Some experts propose the concept of using AI to monitor AI, with human data specialists overseeing both systems. While promising, this approach would require even more financial resources, presenting a significant challenge for healthcare institutions given their budget constraints and the limited availability of AI specialists.
“It’s great to envision a future where we have AI overseeing AI, but is that what we truly want? How many more people are we going to require?” Shah posed the question, highlighting a significant concern in the growing intersection of technology and healthcare.
AI, Healthcare, Cost