Analysis of AI cognitive threshold identifies cost-efficient strategy for health care implementation

Author(s):

Austin Littrell

Key Takeaways

High-capacity LLMs can reduce healthcare costs by up to 17 times when handling 50 simultaneous tasks efficiently.
Over 300,000 experiments showed LLMs' performance drops under heavy cognitive loads, highlighting the need for task grouping.
Task grouping optimizes workflows, conserves resources, and enhances AI utility in healthcare, ensuring reliable support in critical settings.
Further research is needed to assess LLMs' performance in real-time clinical environments and with emerging models.

A study has identified a strategy for using AI models in health care, maintaining efficiency and reducing costs by 17 times.

Researchers from the Icahn School of Medicine at Mount Sinai conducted a study to identify cost-effective and efficient strategies for the implementation of large language models (LLMs) into health care systems. LLMs are a type of generative artificial intelligence (AI) that are trained on large amounts of data, which they are designed to analyze and understand.

The study, published in npj Digital Medicine, demonstrated effective strategies for implementing advanced AI into health care systems, saving time and spending, while ensuring reliable output. An economic analysis in the study reported a cost reduction of as much as 17 times when multiple high-capacity LLMs—Llama-3-70b and GPT-4-turbo-128k, in this analysis—were operating at their maximum capacities of 50 simultaneous clinical tasks.

“Our findings provide a road map for health care systems to integrate advanced AI tools to automate tasks efficiently, potentially cutting costs for application programming interface (API) calls for LLMs up to 17-fold and ensuring stable performance under heavy workloads,” Girish N. Nadkarni, MD, MPH, co-senior author of the study, Irene and Dr. Arthur M. Fishberg professor of medicine at Icahn Mount Sinai, director of The Charles Bronfman Institute of Personalized Medicine and chief of the division of data-driven and digital medicine (D3M) at the Mount Sinai Health System, explained in a university release.

Researchers conducted the study to address concerns over the potential financial barrier to widespread use of AI models in health care systems, due to the large amounts of data that they generate each day.

“Our study was motivated by the need to find practical ways to reduce costs while maintaining performance so health systems can confidently use LLMs at scale,” said Eyal Klang, MD, first author of the study and director of the generative AI research program in the D3M at Icahn Mount Sinai. “We set out to ‘stress test’ these models, assessing how well they handle multiple tasks simultaneously, and to pinpoint strategies that keep both performance high and costs manageable.”

The team of researchers conducted more than 300,000 experiments throughout the course of the study, testing 10 different LLMs with real patient data, examining the ways in which each LLM responded to varying clinical questions. Over the course of the experiments, the team continued to incrementally increase the task loads entrusted to each LLM to determine how well the models responded to rising demands. In addition to tracking accuracy, the team monitored the models’ adherence to clinical instructions.

Researchers were surprised to find that advanced AI models showed signs of strain when pushed to their cognitive limits, with the models’ performances periodically dropping at unpredictable rates when under pressure. The sweet spot, as identified in the study, was 50 tasks that could be managed simultaneously by LLMs without seeing any significant drop in accuracy. This approach suggests that hospitals and health care systems could group tasks together, optimizing workflows and reducing costs.

An economic analysis proved that, when operating in this sweet spot of 50 simultaneous tasks, health care systems could optimize efficiency, reducing costs by as much as 17 times without seeing a significant drop in production. This could save some larger health care systems millions of dollars per year.

Some examples of clinical tasks that were grouped together in the study include matching patients for clinical trials, reviewing medication safety, structuring research cohorts, extracting data for epidemiological studies and identifying patients eligible for preventive health screenings.

“Recognizing the point at which these models begin to struggle under heavy cognitive loads is essential for maintaining reliability and operational stability,” Nadkarni said. “Our findings highlight a practical path for integrating generative AI in hospitals and open the door for further investigation of LLMs’ capabilities within real-world limitations.”

The team has expressed interest in further research into how these LLMs perform in real-time clinical environments, managing real patient workloads and interacting with health care teams. They also plan to conduct similar experiments on newly emerging models to identify whether improvements in technology lead to heightened cognitive thresholds.

“This research has significant implications for how AI can be integrated into health care systems. Grouping tasks for LLMs not only reduces costs but also conserves resources that can be better directed toward patient care,” said David L. Reich, MD, co-author of the study, chief clinical officer of the Mount Sinai Health System, president of the Mount Sinai Hospital and Mount Sinai Queens, Horace W. Goldsmith professor of anesthesiology and professor of AI, human health, pathology, molecular and cell-based medicine at Icahn Mount Sinai. “…By recognizing the cognitive limits of these models, health care providers can maximize AI utility while mitigating risks, ensuring that these tools remain a reliable support in critical health care settings.”