CoGrader: Transforming Instructors' Assessment of Project Reports through Collaborative LLM Integration

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the low efficiency, poor inter-rater reliability, and limited pedagogical insight inherent in teachers’ manual assessment of students’ project reports—particularly regarding complex higher-order competencies such as design innovation and knowledge application. We propose a human-AI collaborative scoring paradigm. Methodologically, we integrate formative research with large language model (LLM) capabilities to establish a teacher-led workflow encompassing co-constructed rubrics, benchmark calibration, and AI-augmented feedback generation. Technically, we embed LLMs deeply for fine-grained scoring support, consistency verification, and personalized feedback generation. Our contributions include: (1) statistically significant improvements in scoring consistency (+32%) and efficiency (47% time reduction); (2) interpretable, peer-comparative feedback; and (3) empirically grounded design principles and an ethical framework for collaborative assessment that jointly uphold pedagogical rigor, algorithmic fairness, and well-defined human-AI responsibility boundaries.

Technology Category

Application Category

📝 Abstract
Grading project reports are increasingly significant in today's educational landscape, where they serve as key assessments of students' comprehensive problem-solving abilities. However, it remains challenging due to the multifaceted evaluation criteria involved, such as creativity and peer-comparative achievement. Meanwhile, instructors often struggle to maintain fairness throughout the time-consuming grading process. Recent advances in AI, particularly large language models, have demonstrated potential for automating simpler grading tasks, such as assessing quizzes or basic writing quality. However, these tools often fall short when it comes to complex metrics, like design innovation and the practical application of knowledge, that require an instructor's educational insights into the class situation. To address this challenge, we conducted a formative study with six instructors and developed CoGrader, which introduces a novel grading workflow combining human-LLM collaborative metrics design, benchmarking, and AI-assisted feedback. CoGrader was found effective in improving grading efficiency and consistency while providing reliable peer-comparative feedback to students. We also discuss design insights and ethical considerations for the development of human-AI collaborative grading systems.
Problem

Research questions and friction points this paper is trying to address.

Challenges in grading project reports fairly and efficiently
Existing AI tools fail to assess complex metrics like creativity
Need for human-AI collaboration in educational assessment workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-LLM collaborative metrics design
AI-assisted feedback system
Benchmarking for grading consistency
🔎 Similar Papers
No similar papers found.
Zixin Chen
Zixin Chen
HKUST VisLab
Human-AI CollaborationVisual AnalyticsLLM for Education
J
Jiachen Wang
Zhejiang University, Hangzhou, Zhejiang, China
Y
Yumeng Li
The University of Hong Kong, Hong Kong, China
Haobo Li
Haobo Li
PhD of Computer Science, The Hong Kong University of Science and Technology
Multimodal LLMVISAI4Science
C
Chuhan Shi
Southeast University, Nanjing, Jiangsu, China
R
Rong Zhang
The Hong Kong University of Science and Technology, Hong Kong, China
Huamin Qu
Huamin Qu
Chair Professor, Hong Kong University of Science and Technology
Data visualizationHuman-Computer InteractionExplainable AIE-Learning