Single-Agent vs. Multi-Agent LLM Strategies for Automated Student Reflection Assessment

📅 2025-04-08

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This study addresses the time-intensive and non-scalable nature of manual assessment of open-ended reflective writing in education. We propose an automated assessment and academic performance prediction framework leveraging large language models (LLMs). Methodologically, we conduct the first systematic comparison of single-agent versus multi-agent architectures, integrated with zero-shot and few-shot prompting strategies, to build a quantitative scoring model on 5,278 student reflection texts. Results show that the few-shot single-agent approach achieves the highest inter-rater agreement with human graders (optimal human–AI alignment). Moreover, LLM-generated reflection scores significantly improve at-risk student identification accuracy and final-grade prediction performance, outperforming traditional baseline models across all metrics. Our core contribution is the empirical validation of a lightweight, few-shot single-agent paradigm as both effective and practical for educational assessment—establishing a novel, scalable, and psychometrically sound paradigm for intelligent, high-fidelity educational evaluation.

Technology Category

Application Category

📝 Abstract

We explore the use of Large Language Models (LLMs) for automated assessment of open-text student reflections and prediction of academic performance. Traditional methods for evaluating reflections are time-consuming and may not scale effectively in educational settings. In this work, we employ LLMs to transform student reflections into quantitative scores using two assessment strategies (single-agent and multi-agent) and two prompting techniques (zero-shot and few-shot). Our experiments, conducted on a dataset of 5,278 reflections from 377 students over three academic terms, demonstrate that the single-agent with few-shot strategy achieves the highest match rate with human evaluations. Furthermore, models utilizing LLM-assessed reflection scores outperform baselines in both at-risk student identification and grade prediction tasks. These findings suggest that LLMs can effectively automate reflection assessment, reduce educators' workload, and enable timely support for students who may need additional assistance. Our work emphasizes the potential of integrating advanced generative AI technologies into educational practices to enhance student engagement and academic success.

Problem

Research questions and friction points this paper is trying to address.

Automate assessment of student reflections using LLMs

Compare single-agent vs multi-agent LLM strategies

Predict academic performance from reflection analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-agent and multi-agent LLM strategies

Zero-shot and few-shot prompting techniques

Automated reflection score quantification

🔎 Similar Papers

A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization