Current and future state of evaluation of large language models for medical summarization tasks

Croxford, Emma; Gao, Yanjun; Pellegrino, Nicholas; Wong, Karen; Wills, Graham; First, Elliot; Liao, Frank; Goswami, Cherodeep; Patterson, Brian W.; Afshar, Majid

doi:10.1038/s44401-024-00011-2

articlenpj Health SystemsFeb 3, 2025DIAMOND OA

Current and future state of evaluation of large language models for medical summarization tasks

ECEmma Croxford YGYanjun Gao NPNicholas Pellegrino KWKaren Wong GWGraham Wills

University of Wisconsin–Madison · University of Colorado Anschutz Medical Campus · +2 more institutions

PubMed

Indexed incrossrefdoajpubmed

Abstract

Large Language Models have expanded the potential for clinical Natural Language Generation (NLG), presenting new opportunities to manage the vast amounts of medical text. However, their use in such high-stakes environments necessitate robust evaluation workflows. In this review, we investigated the current landscape of evaluation metrics for NLG in healthcare and proposed a future direction to address the resource constraints of expert human evaluation while balancing alignment with human judgments.

Citation impact

47

total citations

FWCI: 88.02
Percentile: 100%
References: 72

Citations per year

Authors

10

Topics & keywords

Topics

Keywords

Automatic summarization
Natural language generation
Workflow
Computer science
Unified Medical Language System
Data science
State (computer science)
Resource (disambiguation)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

UN
U.S. National Library of Medicine
Awards: NLM 5T15LM007359, NIH/NLM R00 LM014308-02, R01LM012973