ChatGPT Health performance in a structured test of triage recommendations
Mount Sinai Health System · University of Miami
Abstract
ChatGPT Health was launched in January 2026 as OpenAI's consumer health tool and has reached millions of users. Here we conducted a structured stress test of triage recommendations using 60 clinician-authored vignettes across 21 clinical domains under 16 factorial conditions, yielding 960 total responses. Performance followed an inverted U-shaped pattern, with the most dangerous failures concentrated at clinical extremes-nonurgent presentations (35%) and emergency conditions (48%). Among gold-standard emergencies, the system undertriaged 52% of cases, directing patients with diabetic ketoacidosis or impending respiratory failure to 24-48 h evaluation rather than the emergency department, while correctly…
Citation impact
- FWCI
- 142.74
- Percentile
- 100%
- References
- 16
Authors
19Topics & keywords
- Triage
- Test (biology)
- Emergency department
- Health care
- Intervention (counseling)
- Prospective cohort study
- Emergency medical services
- Zero hunger