CoGrader: Transforming Instructors' Assessment of Project Reports through Collaborative LLM Integration

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This study addresses the low efficiency, poor inter-rater reliability, and limited pedagogical insight inherent in teachers’ manual assessment of students’ project reports—particularly regarding complex higher-order competencies such as design innovation and knowledge application. We propose a human-AI collaborative scoring paradigm. Methodologically, we integrate formative research with large language model (LLM) capabilities to establish a teacher-led workflow encompassing co-constructed rubrics, benchmark calibration, and AI-augmented feedback generation. Technically, we embed LLMs deeply for fine-grained scoring support, consistency verification, and personalized feedback generation. Our contributions include: (1) statistically significant improvements in scoring consistency (+32%) and efficiency (47% time reduction); (2) interpretable, peer-comparative feedback; and (3) empirically grounded design principles and an ethical framework for collaborative assessment that jointly uphold pedagogical rigor, algorithmic fairness, and well-defined human-AI responsibility boundaries.

Technology Category

Application Category

📝 Abstract

Grading project reports are increasingly significant in today's educational landscape, where they serve as key assessments of students' comprehensive problem-solving abilities. However, it remains challenging due to the multifaceted evaluation criteria involved, such as creativity and peer-comparative achievement. Meanwhile, instructors often struggle to maintain fairness throughout the time-consuming grading process. Recent advances in AI, particularly large language models, have demonstrated potential for automating simpler grading tasks, such as assessing quizzes or basic writing quality. However, these tools often fall short when it comes to complex metrics, like design innovation and the practical application of knowledge, that require an instructor's educational insights into the class situation. To address this challenge, we conducted a formative study with six instructors and developed CoGrader, which introduces a novel grading workflow combining human-LLM collaborative metrics design, benchmarking, and AI-assisted feedback. CoGrader was found effective in improving grading efficiency and consistency while providing reliable peer-comparative feedback to students. We also discuss design insights and ethical considerations for the development of human-AI collaborative grading systems.

Problem

Research questions and friction points this paper is trying to address.

Challenges in grading project reports fairly and efficiently

Existing AI tools fail to assess complex metrics like creativity

Need for human-AI collaboration in educational assessment workflows

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-LLM collaborative metrics design

AI-assisted feedback system

Benchmarking for grading consistency

🔎 Similar Papers

No similar papers found.