Large Language Model Influence on Diagnostic Reasoning
Stanford University · VA Palo Alto Health Care System · +8 more institutions
Abstract
Large language models (LLMs) have shown promise in their performance on both multiple-choice and open-ended medical reasoning examinations, but it remains unknown whether the use of such tools improves physician diagnostic reasoning.
To assess the effect of an LLM on physicians' diagnostic reasoning compared with conventional resources. Design, Setting, and Participants: A single-blind randomized clinical trial was conducted from November 29 to December 29, 2023. Using remote video conferencing and in-person participation across multiple academic medical institutions, physicians with training in family medicine, internal medicine, or emergency medicine were recruited. Intervention: Participants were randomized to either access the LLM in addition to conventional diagnostic resources or conventional resources only, stratified by career stage. Participants were allocated 60 minutes to review up to 6 clinical vignettes. Main Outcomes and Measures: The primary outcome was performance on a standardized rubric of diagnostic performance based on differential diagnosis accuracy, appropriateness of supporting and opposing factors, and next diagnostic evaluation steps, validated and graded via blinded expert consensus. Secondary outcomes included time spent per case (in seconds) and final diagnosis accuracy. All analyses followed the intention-to-treat principle. A secondary exploratory analysis evaluated the standalone performance of the LLM by comparing the primary outcomes between the LLM alone group and the conventional resource group.
Citation impact
- FWCI
- 220.69
- Percentile
- 100%
- References
- 41
Authors
16Topics & keywords
- Rubric
- Medicine
- Randomized controlled trial
- MEDLINE
- Intervention (counseling)
- Physical therapy
- Family medicine
- Medical physics