AI Outperforms Doctors in Emergency Triage, Harvard Study Finds

A groundbreaking study from Harvard has demonstrated that artificial intelligence systems can outperform human doctors in high-pressure emergency medicine triage, diagnosing more accurately in critical moments when patients are first rushed to hospital. The findings, published in the journal Science, were hailed by independent experts as a significant advancement in AI clinical reasoning.

Study Details and Results

One experiment focused on 76 patients arriving at a Boston hospital emergency room. An AI and two human doctors were given the same standard electronic health record, including vital signs, demographics, and a brief nurse note. The AI—OpenAI's o1 reasoning model—correctly identified the exact or very close diagnosis in 67% of cases, outperforming the human doctors who achieved only 50%-55% accuracy.

The AI's advantage was particularly pronounced in triage situations requiring rapid decisions with minimal information. When more detailed data was available, the AI's accuracy rose to 82%, compared to 70-79% for expert humans, though this difference was not statistically significant.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Long-Term Treatment Planning

In another test, the AI and 46 doctors were asked to develop longer-term treatment plans for five clinical case studies, including antibiotic regimes and end-of-life processes. The AI scored 89%, significantly outperforming humans who scored 34% using conventional resources like search engines.

Expert Commentary

Dr. Arjun Manrai, lead author and head of an AI lab at Harvard Medical School, stated: "I don't think our findings mean that AI replaces doctors. I think it does mean that we're witnessing a really profound change in technology that will reshape medicine."

Dr. Adam Rodman, another lead author and a physician at Boston's Beth Israel Deaconess Medical Centre, added that AI LLMs are among "the most impactful technologies in decades." He envisions a new "triadic care model" involving the doctor, patient, and an AI system.

Case Study: Lupus Diagnosis

In one illustrative case, a patient presented with a blood clot in the lungs and worsening symptoms. Human doctors suspected failing anti-coagulants, but the AI noted the patient's history of lupus might be causing lung inflammation—a diagnosis later confirmed correct.

Current Usage and Concerns

Nearly one in five US physicians already use AI for diagnosis, according to recent research. In the UK, 16% of doctors use AI daily and 15% weekly, with clinical decision-making being a common application, per a Royal College of Physicians survey. However, UK doctors expressed concerns about AI error and liability risks.

Dr. Rodman acknowledged the lack of a formal accountability framework, stressing that patients ultimately want human guidance in life-or-death decisions.

Prof. Ewen Harrison, co-director of the University of Edinburgh's Centre for Medical Informatics, called the study important, noting that AI systems "are starting to look like useful second-opinion tools for clinicians, particularly when it is important to consider a wider range of possible diagnoses."

Dr. Wei Xing from the University of Sheffield cautioned that some findings suggest doctors may unconsciously defer to AI answers, a tendency that could grow as AI becomes routine. He also highlighted the lack of information on which patient groups the AI struggled with, such as elderly or non-English speakers, and emphasized that the study does not prove AI is safe for routine clinical use.