The LLM-based chatbot reached an incorrect diagnosis in 83 out of 100 pediatric case challenges, finds study.
A large language model (LLM) based chatbot gave the wrong diagnosis for the majority of paediatric cases, finds a recent study published in JAMA Pediatrics. Among the diagnoses, 72% were incorrect and 11% were clinically related but too broad to be considered a correct diagnosis. ChatGPT version 3.5 reached an incorrect diagnosis in 83 out of 100 pediatric case challenges.
Despite the high error rate of the chatbot, the researchers urged physicians to continue to investigate the applications of LLMs to medicine. “LLMs and chatbots have potential as an administrative tool for physicians, demonstrating proficiency in writing research articles and generating patient instructions,” the authors said.
The underwhelming diagnostic performance of the chatbot observed in the study underscores the invaluable role that clinical experience holds. “The chatbot evaluated in this study — unlike physicians — was not able to identify some relationships, such as that between autism and vitamin deficiencies. To improve the generative AI chatbot’s diagnostic accuracy, more selective training is likely required,” they further noted.
One of the shortcomings of some LLMs and chatbots is a lack of real-time access to medical information. “This prevents some chatbots from staying updated with new research, diagnostic criteria, and current health trends or disease outbreaks,” stated the study.
“Some suggest that physicians must take a more active role in generating data sets for LLMs to intentionally prepare them for medical functions — a process broadly referred to as tuning. This presents an opportunity for researchers to investigate if specific medical data training and tuning can improve the diagnostic accuracy of LLM-based chatbots,” the authors said.
An earlier study investigated the diagnostic accuracy of ChatGPT version 4 and found that the artificial intelligence (AI) chatbot rendered a correct diagnosis in 39% of New England Journal of Medicine (NEJM) case challenges. There is indeed high potential, concur researchers, to use LLM-based chatbots as a supplementary tool for clinicians in diagnosing and developing a differential list for complex cases.
Subscribe to our newsletter to get expert insights on health misinformation, updates about global trends, and inspiring initiatives to combat this public health challenge.