Evaluation and mitigation of the limitations of large language models in clinical decision-making

Hager, Paul; Jungmann, Friederike; Holland, Robbie; Bhagat, Kunal; Hubrecht, Inga; Knauer, Manuel; Vielhauer, Jakob; Makowski, Marcus R.; Braren, Rickmer; Kaissis, Georgios; Rueckert, Daniel

doi:10.1038/s41591-024-03097-1

articleNature MedicineJul 4, 2024HYBRID OA

Evaluation and mitigation of the limitations of large language models in clinical decision-making

PHPaul Hager FJFriederike Jungmann RHRobbie Holland KBKunal Bhagat IHInga Hubrecht

TUM Klinikum · Technical University of Munich · +4 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Clinical decision-making is one of the most impactful parts of a physician's responsibilities and stands to benefit greatly from artificial intelligence solutions and large language models (LLMs) in particular. However, while LLMs have achieved excellent performance on medical licensing exams, these tests fail to assess many skills necessary for deployment in a realistic clinical decision-making environment, including gathering information, adhering to guidelines, and integrating into clinical workflows. Here we have created a curated dataset based on the Medical Information Mart for Intensive Care database spanning 2,400 real patient cases and four common abdominal pathologies as well as a framework to…

Citation impact

534

total citations

FWCI: 56.84
Percentile: 100%
References: 72

Citations per year

Authors

11

Topics & keywords

Topics

Keywords

Clinical decision making
Computer science
Intensive care medicine
Risk analysis (engineering)
Medicine
Management science
Engineering

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.

Funding

TU
Technische Universität München