Can large language models reason about medical questions?
Technical University of Denmark · Copenhagen University Hospital · +2 more institutions
Abstract
Although large language models often produce impressive outputs, it remains unclear how they perform in real-world scenarios requiring strong reasoning skills and expert domain knowledge. We set out to investigate whether closed- and open-source models (GPT-3.5, Llama 2, etc.) can be applied to answer and reason about difficult real-world-based questions. We focus on three popular medical benchmarks (MedQA-US Medical Licensing Examination [USMLE], MedMCQA, and PubMedQA) and multiple prompting scenarios: chain of thought (CoT; think step by step), few shot, and retrieval augmentation. Based on an expert annotation of the generated CoTs, we found that InstructGPT can often read, reason, and recall expert…
Citation impact
- FWCI
- 66.85
- Percentile
- 100%
- References
- 102
Authors
4- VLValentin LiévinCorresponding
Technical University of Denmark
- CHChristoffer Hother
Copenhagen University Hospital, Rigshospitalet
- AGAndreas Geert Motzfeldt
Technical University of Denmark
- OWOle Winther
University of Copenhagen, Copenhagen University Hospital, Rigshospitalet, Technical University of Denmark
Topics & keywords
- Computer science
- Closing (real estate)
- Set (abstract data type)
- Annotation
- Artificial intelligence
- Natural language processing
- Programming language
- Quality Education