Assessing the research landscape and clinical utility of large language models: a scoping review
University of New Brunswick · University of Toronto · +2 more institutions
Abstract
Large language models (LLMs) like OpenAI's ChatGPT are powerful generative systems that rapidly synthesize natural language responses. Research on LLMs has revealed their potential and pitfalls, especially in clinical settings. However, the evolving landscape of LLM research in medicine has left several gaps regarding their evaluation, application, and evidence base.
This scoping review aims to (1) summarize current research evidence on the accuracy and efficacy of LLMs in medical applications, (2) discuss the ethical, legal, logistical, and socioeconomic implications of LLM use in clinical settings, (3) explore barriers and facilitators to LLM implementation in healthcare, (4) propose a standardized evaluation framework for assessing LLMs' clinical utility, and (5) identify evidence gaps and propose future research directions for LLMs in clinical applications. EVIDENCE REVIEW: We screened 4,036 records from MEDLINE, EMBASE, CINAHL, medRxiv, bioRxiv, and arXiv from January 2023 (inception of the search) to June 26, 2023 for English-language papers and analyzed findings from 55 worldwide studies. Quality of evidence was reported based on the Oxford Centre for Evidence-based Medicine recommendations.
Citation impact
- FWCI
- 17.20
- Percentile
- 100%
- References
- 64
Authors
7Topics & keywords
- CINAHL
- MEDLINE
- Health care
- Medicine
- Socioeconomic status
- Political science
- Environmental health
- Population
- Peace, Justice and strong institutions