Can large language models reason about medical questions?

Liévin, Valentin; Hother, Christoffer; Motzfeldt, Andreas Geert; Winther, Ole

doi:10.1016/j.patter.2024.100943

articlePatternsMar 1, 2024GOLD OA

Can large language models reason about medical questions?

VLValentin Liévin CHChristoffer Hother AGAndreas Geert Motzfeldt OWOle Winther

Technical University of Denmark · Copenhagen University Hospital · +2 more institutions

PubMed

Indexed incrossrefdoajpubmed

Abstract

Although large language models often produce impressive outputs, it remains unclear how they perform in real-world scenarios requiring strong reasoning skills and expert domain knowledge. We set out to investigate whether closed- and open-source models (GPT-3.5, Llama 2, etc.) can be applied to answer and reason about difficult real-world-based questions. We focus on three popular medical benchmarks (MedQA-US Medical Licensing Examination [USMLE], MedMCQA, and PubMedQA) and multiple prompting scenarios: chain of thought (CoT; think step by step), few shot, and retrieval augmentation. Based on an expert annotation of the generated CoTs, we found that InstructGPT can often read, reason, and recall expert…

Citation impact

248

total citations

FWCI: 66.85
Percentile: 100%
References: 102

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Closing (real estate)
Set (abstract data type)
Annotation
Artificial intelligence
Natural language processing
Programming language

UN Sustainable Development Goals

Quality Education

No related works found for this paper.