Applying large language models and chain-of-thought for automatic scoring

University of Georgia

Indexed incrossrefdoaj

Abstract

This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment tasks (three binomial and three trinomial) with 1650 student responses, we employed six prompt engineering strategies to automatically score student responses. The six strategies combined zero-shot or few-shot learning with…

Citation impact

123
total citations
FWCI
38.87
Percentile
100%
References
70
Citations per year

Authors

5

Topics & keywords

Keywords
  • Rubric
  • Computer science
  • Artificial intelligence
  • Machine learning
  • Natural language processing
  • Mathematics education
  • Psychology
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding