articlenpj Digital MedicineJul 23, 2024GOLD OA

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

National Institutes of Health · United States National Library of Medicine · +16 more institutions

PubMed
Indexed incrossrefdoajpubmed

Abstract

Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges-an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6%…

No related works found for this paper.

Funding