Empowering front-line physicians with AI: Evaluating large language models in everyday ENT care

Hack, Sholem; Zalzal, Habib G.; Attal, Rebecca; Farzad, Armin; Crew, Lilia Ann; Tessler, Idit; Frankel, Talya; Gvili, Ben; Shivatzki, Shaked; Wolfovitz, Amit; Rozendorn, Noa

doi:10.1016/j.ajem.2026.01.029

articleThe American Journal of Emergency MedicineJan 20, 2026HYBRID OA

Empowering front-line physicians with AI: Evaluating large language models in everyday ENT care

SHSholem Hack HGHabib G. Zalzal RARebecca Attal AFArmin Farzad LALilia Ann Crew

Sheba Medical Center · Children's National · +3 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Methods

Twelve clinical vignettes representing routine and urgent presentations were developed and validated by otolaryngologists. One hundred practicing physicians in family medicine and emergency medicine, including residents and attending physicians, completed all vignettes by providing a diagnosis, management plan, and referral decision. Four large language models (Gemini-2.0, ChatGPT-4.0, ChatGPT-5, and OpenEvidence) were tested using identical prompts. Model outputs were anonymized, randomized, and rated by a blinded expert panel using the Quality Analysis of Medical Artificial Intelligence tool, which assesses accuracy, clarity, completeness, sourcing, relevance, and usefulness.

Results

Physicians achieved mean diagnostic accuracy of 91.6% and management accuracy of 87.9%. In non-urgent cases, 30.4% of responses represented inappropriate referral. Only half recognized the need for urgent referral in a cerebrospinal fluid leak scenario. Large language models demonstrated comparable diagnostic and management accuracy with higher referral appropriateness.

Citation impact

4

total citations

FWCI: 41.86
Percentile: 100%
References: 42

Too recent for citation history.

Authors

11

Topics & keywords

Topics

Keywords

Otorhinolaryngology
Patient care
Language model
MEDLINE
Acute care

No related works found for this paper.