Current and future state of evaluation of large language models for medical summarization tasks
University of Wisconsin–Madison · University of Colorado Anschutz Medical Campus · +2 more institutions
Indexed incrossrefdoajpubmed
Abstract
Large Language Models have expanded the potential for clinical Natural Language Generation (NLG), presenting new opportunities to manage the vast amounts of medical text. However, their use in such high-stakes environments necessitate robust evaluation workflows. In this review, we investigated the current landscape of evaluation metrics for NLG in healthcare and proposed a future direction to address the resource constraints of expert human evaluation while balancing alignment with human judgments.
Citation impact
47
total citations
- FWCI
- 88.02
- Percentile
- 100%
- References
- 72
Citations per year
Authors
10Topics & keywords
Topics
Keywords
- Automatic summarization
- Natural language generation
- Workflow
- Computer science
- Unified Medical Language System
- Data science
- State (computer science)
- Resource (disambiguation)
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.