LLM-based NLG Evaluation: Current Status and Challenges

Gao, Mingqi; Hu, Xinyu; Yin, Xunjian; Ruan, Jie; Pu, Xiao; Wan, Xiaojun

doi:10.1162/coli_a_00561

articleComputational LinguisticsJan 1, 2025DIAMOND OA

LLM-based NLG Evaluation: Current Status and Challenges

MGMingqi Gao XHXinyu Hu XYXunjian Yin JRJie Ruan XPXiao Pu

King University · Peking University

Indexed incrossrefdoaj

Abstract

Abstract Evaluating natural language generation (NLG) is a vital but challenging problem in natural language processing. Traditional evaluation metrics mainly capturing content (e.g., n-gram) overlap between system outputs and references are far from satisfactory, and large language models (LLMs) such as ChatGPT have demonstrated great potential in NLG evaluation in recent years. Various automatic evaluation methods based on LLMs have been proposed, including metrics derived from LLMs, prompting LLMs, fine-tuning LLMs, and human–LLM collaborative evaluation. In this survey, we first give a taxonomy of LLM-based NLG evaluation methods, and discuss their pros and cons, respectively. Lastly, we discuss several…

Citation impact

53

total citations

FWCI: 50.74
Percentile: 100%
References: 126

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Current (fluid)
Data science

No related works found for this paper.