🤖 AI Summary
To address the inefficiency and low quality of automated teaching feedback, this paper proposes a Generate–Evaluate–Regenerate (G-E-RG) multi-agent framework, introducing a novel three-stage closed-loop optimization paradigm. The framework integrates constructivism and two other educational theories with zero-shot prompting and retrieval-augmented generation (RAG)-enhanced chain-of-thought (CoT) reasoning to generate feedback that is highly valid, reliable, structurally complete, and pedagogically interpretable. It represents the first systematic integration of foundational educational theory with state-of-the-art large language model (LLM) prompting techniques. Evaluation shows that coverage of four core feedback components increases significantly—from 27.72% to 98.49%. Across six distinct evaluation metrics, accuracy improves by 3.36–12.98 percentage points, while conciseness and other key dimensions achieve statistically significant gains (p < 0.001).
📝 Abstract
Producing large volumes of high-quality, timely feedback poses significant challenges to instructors. To address this issue, automation technologies-particularly Large Language Models (LLMs)-show great potential. However, current LLM-based research still shows room for improvement in terms of feedback quality. Our study proposed a multi-agent approach performing"generation, evaluation, and regeneration"(G-E-RG) to further enhance feedback quality. In the first-generation phase, six methods were adopted, combining three feedback theoretical frameworks and two prompt methods: zero-shot and retrieval-augmented generation with chain-of-thought (RAG_CoT). The results indicated that, compared to first-round feedback, G-E-RG significantly improved final feedback across six methods for most dimensions. Specifically:(1) Evaluation accuracy for six methods increased by 3.36% to 12.98% (p<0.001); (2) The proportion of feedback containing four effective components rose from an average of 27.72% to an average of 98.49% among six methods, sub-dimensions of providing critiques, highlighting strengths, encouraging agency, and cultivating dialogue also showed great enhancement (p<0.001); (3) There was a significant improvement in most of the feature values (p<0.001), although some sub-dimensions (e.g., strengthening the teacher-student relationship) still require further enhancement; (4) The simplicity of feedback was effectively enhanced (p<0.001) for three methods.