ChatGPT Health fails to recognise medical emergencies in over half of cases, study finds

A study published in Nature Medicine has found that ChatGPT Health, OpenAI's AI health advice platform, fails to recommend hospital visits in more than half of cases where urgent medical care is needed. The first independent safety evaluation of the system, which launched in January, revealed it under-triaged 51.6% of emergency scenarios, advising patients to stay home or book routine appointments instead of seeking immediate care.

Researchers led by Dr Ashwin Ramaswamy created 60 realistic patient scenarios covering conditions from mild illnesses to emergencies. Three independent doctors reviewed each case and agreed on the necessary level of care. The team then asked ChatGPT Health for advice, generating nearly 1,000 responses under different conditions, including varying patient gender, adding test results, or including family comments. The platform's recommendations were compared against the doctors' assessments.

While ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions, it struggled in other situations. In one asthma scenario, it advised waiting despite identifying early warning signs of respiratory failure. In 84% of simulations involving a suffocating woman, the platform directed her to a future appointment she would not live to see. Alex Ruani, a doctoral researcher at University College London, described the results as 'unbelievably dangerous', warning that a false sense of security could cost lives.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

The platform also showed significant inconsistency in detecting suicidal ideation. When a 27-year-old patient described suicidal thoughts alone, a crisis intervention banner appeared every time. However, when normal lab results were added, the banner vanished in all 16 attempts. Dr Ramaswamy noted that a crisis guardrail that depends on whether lab results are mentioned is 'arguably more dangerous than having no guardrail at all'.

OpenAI responded that the study did not reflect real-world usage and that the model is continuously updated. However, experts argue that the plausible risk of harm justifies stronger safeguards and independent oversight. The study highlights urgent concerns about the safety of AI-driven health advice, with calls for clear safety standards and independent auditing mechanisms.