News
Article
Author(s):
7% of messages generated by AI were deemed unsafe
In a study published in Lancet Digital Health, researchers from Mass General Brigham showed the promising role of Large Language Models (LLMs) in reducing physician workload and enhancing patient education. However, the study underscores the necessity of vigilant oversight due to potential risks associated with LLM-generated communications.
Physicians today face mounting administrative burdens, contributing significantly to burnout rates. To address this challenge, electronic health record (EHR) vendors have increasingly turned to generative AI algorithms to assist clinicians in composing patient messages. Despite the potential efficiency gains, questions lingered regarding safety and clinical impact.
Lead author Dr. Danielle Bitterman, from the Artificial Intelligence in Medicine (AIM) Program at Mass General Brigham, said in a statement, "Generative AI has the potential to provide a 'best of both worlds' scenario of reducing burden on the clinician and better educating the patient." However, concerns regarding risks prompted the study.
The research team employed OpenAI’s GPT-4 to generate scenarios about cancer patients alongside accompanying questions. Six radiation oncologists evaluated and edited responses, unaware of their origin. Results indicated that while LLM-generated responses were longer and more educative, they occasionally lacked directive instructions, potentially jeopardizing patient safety. But 58% of the time, the AI-generated messages required no editing.
On average, physician-drafted responses were shorter but aligned more closely with LLM-generated responses post-editing. Despite improvements in perceived efficiency, 7.1% of unedited LLM-generated responses posed risks to patients, including 0.6% with potential life-threatening implications.
Mass General Brigham is currently piloting the integration of generative AI into EHRs to draft patient portal message replies across its ambulatory practices.
Looking ahead, researchers aim to gauge patient perceptions of LLM-based communications and examine how demographic factors influence LLM-generated responses, considering known algorithmic biases. Bitterman emphasized the importance of continuous oversight, clinician training, and AI literacy to mitigate risks associated with AI integration in medicine.