From Coders to Critics: Empowering Students through Peer Assessment in the Age of AI Copilots

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The widespread adoption of AI programming assistants exacerbates assessment distortion, academic integrity violations, and insufficient development of higher-order competencies in programming education. This study empirically evaluates a scalable, anonymized, rubric-driven peer assessment mechanism as a robust alternative to traditional instructor grading—vulnerable to AI-generated submissions—in a large introductory programming course. Employing a mixed-methods design, we quantify inter-rater reliability via correlation (r ≈ 0.6), mean absolute error (MAE), and root mean square error (RMSE), and analyze educational impact through 47 student reflection surveys. Results demonstrate that the framework maintains baseline reliability while significantly enhancing students’ evaluative competence, feedback quality, and course engagement. To our knowledge, this is the first systematic validation in large-scale programming instruction of peer assessment’s dual efficacy in upholding academic integrity and fostering higher-order thinking. We propose a novel peer assessment framework balancing reliability, scalability, and pedagogical value.

Technology Category

Application Category

📝 Abstract
The rapid adoption of AI powered coding assistants like ChatGPT and other coding copilots is transforming programming education, raising questions about assessment practices, academic integrity, and skill development. As educators seek alternatives to traditional grading methods susceptible to AI enabled plagiarism, structured peer assessment could be a promising strategy. This paper presents an empirical study of a rubric based, anonymized peer review process implemented in a large introductory programming course. Students evaluated each other's final projects (2D game), and their assessments were compared to instructor grades using correlation, mean absolute error, and root mean square error (RMSE). Additionally, reflective surveys from 47 teams captured student perceptions of fairness, grading behavior, and preferences regarding grade aggregation. Results show that peer review can approximate instructor evaluation with moderate accuracy and foster student engagement, evaluative thinking, and interest in providing good feedback to their peers. We discuss these findings for designing scalable, trustworthy peer assessment systems to face the age of AI assisted coding.
Problem

Research questions and friction points this paper is trying to address.

Addressing AI's impact on programming education assessment
Exploring peer review as an alternative to traditional grading
Evaluating accuracy and benefits of student peer assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rubric-based anonymized peer review process
Comparison of peer and instructor grades
Reflective surveys on fairness and grading
🔎 Similar Papers
No similar papers found.
S
Santiago Berrezueta-Guzman
Technical University of Munich, Heilbronn, Germany
Stephan Krusche
Stephan Krusche
Professor, Computer Science, Technical University Munich
Education TechnologiesHuman Computer InteractionsSoftware EngineeringMachine Learning
S
Stefan Wagner
Technical University of Munich, Heilbronn, Germany