News
Article
Author(s):
In head-to-head comparison, artificial intelligence gave more guidance-adherent and accurate recommendations than doctors in over 20% of cases.
© mast3r - stock.adobe.com
Artificial intelligence (AI) may be ready to play an even larger role in primary care, according to a new study showing that AI-generated diagnoses and treatment recommendations in virtual urgent care visits were often more accurate than those made by physicians.
Researchers at Tel Aviv University and Cedars-Sinai Connect, a virtual urgent care clinic in Los Angeles, evaluated how an AI system stacked up against doctors in 461 patient visits involving common symptoms — respiratory, urinary, eye, vaginal and dental complaints. The findings, published in the Annals of Internal Medicine, suggest that AI has the potential to improve decision-making, especially in cases where clinical guidelines should steer care.
“In this study, we found that AI, based on a targeted intake process, can provide diagnostic and treatment recommendations that are, in many cases, more accurate than those made by physicians,” said Dan Zeltzer, PhD, a professor from the Berglas School of Economics at Tel Aviv University and the study’s lead author.
The AI system, developed by Israeli startup K Health, conducts structured intake chats, pulls data from patients’ medical records, and generates diagnostic and treatment suggestions when it is highly confident — about 80% of the time. Human clinicians then conduct telemedicine consultations to determine the final course of treatment.
To compare performance, a panel of experienced physicians evaluated both AI-generated and physician-delivered recommendations on a four-point scale: optimal, reasonable, inadequate or potentially harmful. The AI came out ahead.
AI recommendations were rated “optimal” in 77% of cases, compared with 67% for the treating physicians. The panel rated AI as “potentially harmful” in just 2.8% of cases, compared to 4.6% of the physicians’ decisions. In about 21% of cases, AI recommendations were rated better than those of doctors. The reverse was true in just 11% of cases.
The study authors noted several potential advantages of the AI system. According to an official university release, the AI more strictly adhered to clinical guidelines — such as avoiding antibiotics for viral infections — more comprehensively incorporated medical record data, and more precisely flagged reg flags, like eye pain in contact lens wearers, which could indicate infection.
Physicians, however, demonstrated strengths in clinical flexibility and real-time interpretation. For instance, the release described how a doctor might assess a COVID-19 patient’s reported shortness of breath as minor congestion, while the AI might err on the side of caution, referring the patient to the emergency room.
Of the 461 visits analyzed between June and July 2024, 262 (57%) showed full agreement between the AI and the treating physician. In cases of disagreement, the AI’s recommendations were rated as higher quality twice as often as the physician’s.
Still, researchers cautioned against drawing conclusions about real-world impact. It’s unknown whether physicians actually viewed or relied on the AI recommendations when making their decisions.
“One limitation of the study is that we do not know which of the physicians reviewed the AI's recommendations in the available chart, or to what extent they relied on these recommendations,” Zeltzer said. “Thus, the study only measured the accuracy of the algorithm’s recommendations and not their impact on the physicians.”
The study was funded by K Health and presented during the American College of Physicians (ACP) Internal Medicine Meeting 2025.
“The relatively common conditions included in our study represent about two-thirds of the clinic's case volume, and thus the findings can be meaningful for assessing AI's readiness to serve as a decision-support tool in medical practice,” Zeltzer said.
As AI tools continue to grow more sophisticated, questions remain about how best to integrate them into care delivery systems. This study suggests that practices exploring AI assistance — especially in virtual settings — may be looking at a future where machine-generated recommendations are not only helpful, but in many cases, are more accurate than human judgement alone.
“We can envision a near future in which algorithms assist in an increasing portion of medical decisions, bringing certain data to the doctor's attention, and facilitating faster decisions with fewer human errors,” Zeltzer said. “Of course, many questions still remain about the best way to implement AI in the diagnostic and treatment process, as well as the optimal integration between human expertise and artificial intelligence in medicine.”