Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Wang, Jiaan; Liang, Yunlong; Meng, Fandong; Sun, Zengkui; Shi, Haoxiang; Li, Zhixu; Xu, Jinan; Qu, Jianfeng; Zhou, Jie

doi:10.18653/v1/2023.newsum-1.1

articleJan 1, 2023GOLD OA

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

JWJiaan Wang YLYunlong Liang FMFandong Meng ZSZengkui Sun HSHaoxiang Shi

Soochow University · Beijing Jiaotong University · +3 more institutions

Indexed incrossref

Abstract

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a…

Citation impact

217

total citations

FWCI: 35.87
Percentile: 100%
References: 42

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Automatic summarization
Natural language generation
Metric (unit)
Computer science
Relevance (law)
Task (project management)
Artificial intelligence
Natural language processing

UN Sustainable Development Goals

Quality Education

No related works found for this paper.