Applying large language models and chain-of-thought for automatic scoring
Indexed incrossrefdoaj
Abstract
This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment tasks (three binomial and three trinomial) with 1650 student responses, we employed six prompt engineering strategies to automatically score student responses. The six strategies combined zero-shot or few-shot learning with…
Citation impact
123
total citations
- FWCI
- 38.87
- Percentile
- 100%
- References
- 70
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Rubric
- Computer science
- Artificial intelligence
- Machine learning
- Natural language processing
- Mathematics education
- Psychology
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.