News
Article
Author(s):
Experimental AI technology outperforms doctors in taking medical histories, diagnosing conditions
Could a chatbot take medical histories and diagnose conditions better than a doctor? Results of a recent study raise that tantalizing possibility.
Researchers at Google and its subsidiary Google DeepMind conducted a test using Articulate Medical Intelligence Explorer (AMIE), an experimental medical chatbot developed by Google. Based on a large language model, AMIE was designed for taking medical histories and diagnosing conditions while emphasizing management reasoning, communication skills and empathy.
The experiment consisted of a blinded study in which AMIE and board-certified primary care physicians (PCPs) engaged in text message-based consultations with patient actors involving 149 different health care scenarios. Specialist physicians and the actors then compared results of the AMIE and physician consultations using a framework for evaluating clinically-meaningful axes of performance that included history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy.
In the results, which were published in a paper on the website arXiv.org, AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of the 32 axes evaluated by the specialist physicians, and 24 of the 26 axes used by the evaluating patient actors.
The authors note that an important differentiator lay in AMIE’s lengthy and detailed responses, suggesting that more time was spent preparing the response. That would be consistent with previous findings that patient satisfaction increases with time spent with their physician, they say.
The authors emphasize that their findings have important limitations, such as conducting the consultations via text messaging a method “which permits large-scale LLM-patient interactions but is not representative of usual clinical practice,” they note. And “while further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI,” they conclude.