articleJan 1, 2023GOLD OA
Can Large Language Models Be an Alternative to Human Evaluations?
Indexed incrossref
Abstract
Human evaluation is indispensable and inevitable for assessing the quality of texts generated by machine learning models or written by humans. However, human evaluation is very difficult to reproduce and its quality is notoriously unstable, hindering fair comparisons among different natural language processing (NLP) models and algorithms.Recently, large language models (LLMs) have demonstrated exceptional performance on unseen tasks when only the task instructions are provided.In this paper, we explore if such an ability of the LLMs can be used as an alternative to human evaluation.We present the LLMs with the exact same instructions, samples to be evaluated, and questions used to conduct human evaluation, and…
Citation impact
257
total citations
- FWCI
- 42.67
- Percentile
- 100%
- References
- 40
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Task (project management)
- Computer science
- Quality (philosophy)
- Adversarial system
- Artificial intelligence
- Natural language processing
- Data science
- Epistemology
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.