articleJan 1, 2023GOLD OA

Can Large Language Models Be an Alternative to Human Evaluations?

National Taiwan University

Indexed incrossref

Abstract

Human evaluation is indispensable and inevitable for assessing the quality of texts generated by machine learning models or written by humans. However, human evaluation is very difficult to reproduce and its quality is notoriously unstable, hindering fair comparisons among different natural language processing (NLP) models and algorithms.Recently, large language models (LLMs) have demonstrated exceptional performance on unseen tasks when only the task instructions are provided.In this paper, we explore if such an ability of the LLMs can be used as an alternative to human evaluation.We present the LLMs with the exact same instructions, samples to be evaluated, and questions used to conduct human evaluation, and…

Citation impact

257
total citations
FWCI
42.67
Percentile
100%
References
40
Citations per year

Authors

2

Topics & keywords

Keywords
  • Task (project management)
  • Computer science
  • Quality (philosophy)
  • Adversarial system
  • Artificial intelligence
  • Natural language processing
  • Data science
  • Epistemology
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding