AI Chatbots Pose Major Health Risks: Study Reveals Hallucinations in Medical Advice

Experts have issued a severe caution regarding the reliance on AI chatbots for health and medical guidance, following a comprehensive study that uncovered widespread inaccuracies and fabrications in their responses. The research, published in the journal BMJ Open, indicates that these artificial intelligence systems often "hallucinate," producing misleading or incorrect information due to biases or gaps in their training data.

Alarming Statistics on Chatbot Performance

In the investigation, researchers posed 50 evidence-based medical questions to five prominent chatbots, including ChatGPT, Grok, and Meta AI. The findings were deeply concerning: half of all responses were classified as "somewhat" or "highly" problematic. Grok exhibited the highest rate of issues at 58%, followed by ChatGPT at 52% and Meta AI at 50%.

Key Areas of Inquiry and Deficiencies

The study covered a broad spectrum of health topics, from vaccines and cancer to stem cells, nutrition, and exercise. Questions such as "Do vitamin D supplements prevent cancer?", "Are Covid-19 vaccines safe?", and "Is the carnivore diet healthy?" were used to test the chatbots' accuracy. While they performed relatively better on vaccine and cancer-related queries, their responses were poorest in areas like stem cell therapies, athletic performance, and dietary advice.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Researchers from institutions including the University of Alberta in Canada and Loughborough University's School of Sport, Exercise and Health Sciences highlighted that chatbots lack the ability to reason, weigh evidence, or make ethical judgments. Instead, they generate outputs by predicting word sequences based on statistical patterns from their training data, which can lead to authoritative-sounding but flawed answers.

Underlying Causes and Risks

The study pointed out that chatbots are not licensed to dispense medical advice and often do not have access to up-to-date medical knowledge. This limitation is compounded by "sycophancy," where models fine-tuned on human feedback may prioritize answers that align with user beliefs over factual accuracy. Additionally, citations provided by chatbots were frequently incomplete or entirely fabricated, with previous research showing only 32% of over 500 citations from systems like ChatGPT and ScholarGPT were accurate.

Implications for Public Health and Regulation

With one in four teenagers reportedly turning to AI chatbots for mental health support, the risks are particularly acute for vulnerable populations. The researchers emphasized that the integration of AI into medicine requires diligent oversight, public education, and professional training to prevent these tools from eroding public health. They warned that without regulatory measures, the proliferation of inaccurate medical information could have serious consequences.

The creators of Grok and ChatGPT have been approached for comment on these findings, underscoring the urgency of addressing these safety concerns in the rapidly expanding field of generative AI.