News
Article
Author(s):
AI models evaluated for accuracy in providing preventive medicine recommendations fare poorly compared to humans
Recent studies have highlighted gaps in accessing preventive services among Americans aged 35 and older, with only 8% receiving all recommended preventive services. As patients increasingly seek supplementary information beyond their physicians' advice, the reliability of online resources becomes increasingly important.
A recent study in the American Journal of Preventive Medicine sought to evaluate the accuracy of two prominent AI models, ChatGPT-4 and Bard, in providing recommendations related to preventive medicine and primary care. Fifty-six questions covering screening guidelines, preventive strategies, and disease management were posed to these AI models. Each response generated was independently reviewed by two physicians and categorized as accurate, accurate with missing information, or inaccurate. The study aimed to assess the reliability of AI tools as supplementary resources in health care.
The findings revealed varying degrees of accuracy in the responses provided by ChatGPT-4 and Bard. ChatGPT-4 demonstrated 28.6% accurate responses, 42.8% accurate with missing information, and 28.6% inaccurate responses. In contrast, Bard exhibited 53.6% accurate responses, 28.6% accurate with missing information, and 17.8% inaccurate responses.
Notably, both AI models struggled with immunization-related questions, with considerable inaccuracies observed. Additionally, ChatGPT-4's knowledge base showed limitations due to outdated recommendations, highlighting the importance of continuous updates in AI systems. Bard, with its ability to continually update with information, demonstrated higher accuracy rates, albeit with room for improvement in certain areas, according to the study.
While AI tools can provide valuable information, particularly in patient education, they should not be seen as replacements for physicians, researchers say. The study emphasized the importance of using AI as supplementary resources, rather than sole sources of medical advice. Furthermore, researchers say it underscores the need for ongoing evaluation and updates of AI tools to ensure their effectiveness and relevance in health care settings.
While AI models like ChatGPT-4 and Bard show promise in providing medical recommendations, their accuracy and relevance need improvement, according to the study authors. Continuous evaluation and updates are crucial for enhancing their performance and ensuring their effective application in health care. Future studies should aim to assess AI systems' performance in real-life settings and incorporate broader question formulations to account for evolving medical practices and guidelines.