Large language models encode clinical knowledge

Singhal, Karan; Azizi, Shekoofeh; Tu, Tao; Mahdavi, S. Sara; Lee, Jason; Chung, Hyung Won; Scales, Nathan; Tanwani, Ajay Kumar; Cole-Lewis, Heather; Pfohl, Stephen; Payne, Perry W.; Seneviratne, Martin; Gamble, Paul; Kelly, Christopher; Babiker, Abubakr; Schärli, Nathanael; Chowdhery, Aakanksha; Mansfield, P.; Demner‐Fushman, Dina; Arcas, Blaise Agüera y; Webster, Dale R.; Corrado, Greg S.; Matias, Yossi; Chou, Katherine; Gottweis, Juraj; Tomašev, Nenad; Liu, Yun; Rajkomar, Alvin; Barral, Joëlle; Semturs, Christopher; Karthikesalingam, Alan; Natarajan, Vivek

doi:10.1038/s41586-023-06291-2

articleNatureJul 12, 2023HYBRID OA

Large language models encode clinical knowledge

KSKaran Singhal SAShekoofeh Azizi TTTao Tu SSS. Sara Mahdavi JLJason Lee

Google (United States) · United States National Library of Medicine · +1 more institution

PubMed

Indexed incrossrefpubmed

Abstract

Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model 1…

Citation impact

3,039

total citations

FWCI: 502.22
Percentile: 100%
References: 91

Citations per year

Authors

32

Topics & keywords

Topics

Keywords

Computer science
Benchmark (surveying)
Language model
Comprehension
Artificial intelligence
Harm
Key (lock)
Unified Medical Language System

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

D
DeepMind