Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

Jin, Qiao; Chen, Fangyuan; Zhou, Yiliang; Xu, Ziyang; Cheung, Justin M.; Chen, Robert F.; Summers, Ronald M.; Rousseau, Justin F.; Ni, Peiyun; Landsman, Marc; Baxter, Sally L.; Al’Aref, Subhi J.; Li, Yijia; Chen, Alexander; Brejt, Josef A.; Chiang, Michael F.; Peng, Yifan; Lu, Zhiyong

doi:10.1038/s41746-024-01185-7

articlenpj Digital MedicineJul 23, 2024GOLD OA

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

QJQiao Jin FCFangyuan Chen YZYiliang Zhou ZXZiyang Xu JMJustin M. Cheung

National Institutes of Health · United States National Library of Medicine · +16 more institutions

PubMed

Indexed incrossrefdoajpubmed

Abstract

Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges-an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6%…

Citation impact

107

total citations

FWCI: 11.39
Percentile: 100%
References: 18

Citations per year

Authors

18

Topics & keywords

Topics

Keywords

Artificial intelligence
Computer vision
Computer science
Precision medicine
Medicine
Pathology

No related works found for this paper.