News
Article
Author(s):
A recent study compared the diagnostic performance of physicians referencing AI to those limited to conventional resources.
Researchers from the University of Minnesota Medical School, Stanford University, Beth Israel Deaconess Medical Center and the University of Virginia analyzed the efficacy of GPT-4, an artificial intelligence (AI) large language model (LLM) system, as a diagnostic tool to assist physicians’ diagnoses. The study, published in JAMA Network Open, found that physicians’ access to GPT-4 as a diagnostic aid did not result in significantly improved clinical reasoning compared to physicians left with conventional resources, including UpToDate and Google.
“The field of AI is expanding rapidly and impacting our lives inside and outside of medicine. It is important that we study these tools and understand how we best use them to improve the care we provide as well as the experience of providing it, Andrew Olson, M.D., a professor at the University of Minnesota Medical School and a hospitalist with M Health Fairview. “This study suggests that there are opportunities for further improvement in physician-AI collaboration in clinical practice.”
The study analyzed 50 total U.S.-licensed physicians across family, internal and emergency medicine. The median diagnostic reasoning score per case was 76% for the group with AI access, and 74% for the group only referencing conventional resources. The AI group spent an average of 519 seconds per case, compared with 565 seconds per case for the conventional resources group.
Researchers were able to conclude that access to GPT-4 did not significantly increase physicians’ diagnostic reasoning, although, on its own, the LLM did surpass the performances of both clinicians using conventional diagnostic online resources, and clinicians assisted by the program. These findings could prove the necessity of further research to understand how clinicians should be trained to use these tools.
Independently, the LLM demonstrated higher performance than either physician group, thereby indicating the need for training and development to achieve the full potential of physician-AI collaboration in clinical practice. At the forefront of these efforts, the four institutions behind the study announced a collaboration on a bi-coastal AI evaluation network, ARiSE, designed to further evaluate generative AI outputs in healthcare.