articleNature CommunicationsJan 14, 2025GOLD OA

Large Language Models lack essential metacognition for reliable medical reasoning

UCLouvain · Cliniques Universitaires Saint-Luc

PubMed
Indexed incrossrefdoajpubmed

Abstract

Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their…

Citation impact

76
total citations
FWCI
88.87
Percentile
100%
References
48
Citations per year

Authors

4

Topics & keywords

Keywords
  • Metacognition
  • Recall
  • Computer science
  • Benchmark (surveying)
  • Cognitive psychology
  • Inclusion (mineral)
  • Psychology
  • Artificial intelligence
No related works found for this paper.

Funding