Towards Expert-Level Medical Question Answering with Large Language Models

Singhal, Karan; Tu, Tao; Gottweis, Juraj; Sayres, Rory; Wulczyn, Ellery; Hou, Le; Clark, Kevin; Pfohl, Stephen; Cole-Lewis, Heather; Neal, Darlene; Schaekermann, Mike; Wang, Amy; Amin, Mohamed; Lachgar, Sami; Mansfield, P.; Prakash, Sushant; Green, Bradley; Dominowska, Ewa; Arcas, Blaise Agüera y; Tomašev, Nenad; Liu, Yun; Wong, Renee; Semturs, Christopher; Mahdavi, S. Sara; Barral, Joëlle; Webster, Dale A.; Corrado, Greg S.; Matias, Yossi; Azizi, Shekoofeh; Karthikesalingam, Alan; Natarajan, Vivek

doi:10.48550/arxiv.2305.09617

preprintarXiv (Cornell University)May 16, 2023GREEN OA

Towards Expert-Level Medical Question Answering with Large Language Models

KSKaran Singhal TTTao Tu JGJuraj Gottweis RSRory Sayres EWEllery Wulczyn

Indexed inarxivdatacite

Abstract

Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present…

Citation impact

334

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

31

Topics & keywords

Topics

Keywords

Ranking (information retrieval)
Pairwise comparison
Computer science
Artificial intelligence
Question answering
Machine learning
Medical education
Natural language processing

UN Sustainable Development Goals

Quality Education

No related works found for this paper.