Large Language Models lack essential metacognition for reliable medical reasoning

Griot, Maxime; Hemptinne, Coralie; Vanderdonckt, Jean; Yüksel, Demet

doi:10.1038/s41467-024-55628-6

articleNature CommunicationsJan 14, 2025GOLD OA

Large Language Models lack essential metacognition for reliable medical reasoning

MGMaxime Griot CHCoralie Hemptinne JVJean Vanderdonckt DYDemet Yüksel

UCLouvain · Cliniques Universitaires Saint-Luc

PubMed

Indexed incrossrefdoajpubmed

Abstract

Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their…

Citation impact

76

total citations

FWCI: 88.87
Percentile: 100%
References: 48

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Metacognition
Recall
Computer science
Benchmark (surveying)
Cognitive psychology
Inclusion (mineral)
Psychology
Artificial intelligence

No related works found for this paper.