Can Large Language Models Be an Alternative to Human Evaluations?

Chiang, Cheng-Han; Lee, Hung-yi

doi:10.18653/v1/2023.acl-long.870

articleJan 1, 2023GOLD OA

Can Large Language Models Be an Alternative to Human Evaluations?

CCCheng-Han Chiang HLHung-yi Lee

National Taiwan University

Indexed incrossref

Abstract

Human evaluation is indispensable and inevitable for assessing the quality of texts generated by machine learning models or written by humans. However, human evaluation is very difficult to reproduce and its quality is notoriously unstable, hindering fair comparisons among different natural language processing (NLP) models and algorithms.Recently, large language models (LLMs) have demonstrated exceptional performance on unseen tasks when only the task instructions are provided.In this paper, we explore if such an ability of the LLMs can be used as an alternative to human evaluation.We present the LLMs with the exact same instructions, samples to be evaluated, and questions used to conduct human evaluation, and…

Citation impact

257

total citations

FWCI: 42.67
Percentile: 100%
References: 40

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Task (project management)
Computer science
Quality (philosophy)
Adversarial system
Artificial intelligence
Natural language processing
Data science
Epistemology

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

DE
Delta Electronics