Applying large language models and chain-of-thought for automatic scoring

Lee, Gyeong-Geon; Latif, Ehsan; Wu, Xuansheng; Liu, Ninghao; Zhaı, Xiaoming

doi:10.1016/j.caeai.2024.100213

articleComputers and Education Artificial IntelligenceFeb 27, 2024GOLD OA

Applying large language models and chain-of-thought for automatic scoring

GLGyeong-Geon Lee ELEhsan Latif XWXuansheng Wu NLNinghao Liu XZXiaoming Zhaı

University of Georgia

Indexed incrossrefdoaj

Abstract

This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment tasks (three binomial and three trinomial) with 1650 student responses, we employed six prompt engineering strategies to automatically score student responses. The six strategies combined zero-shot or few-shot learning with…

Citation impact

123

total citations

FWCI: 38.87
Percentile: 100%
References: 70

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Rubric
Computer science
Artificial intelligence
Machine learning
Natural language processing
Mathematics education
Psychology

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NS
National Science Foundation
Award: 2101104