🤖 AI Summary
In programming education, AI-generated immediate feedback often lacks accuracy, while instructor-provided feedback is difficult to scale. Method: This paper proposes a “teacher-in-the-loop” collaborative framework that establishes a closed-loop support system wherein AI generates initial prompts and dynamically escalates to human instructors when student responses indicate insufficient understanding. The system integrates prompt generation, feedback collection, and manual intervention orchestration. It was deployed and evaluated in a real-world data science course (N = 82). Results: Among 673 AI-generated prompts, 22% were ineffective—11% triggered instructor escalation. Approximately 50% of unresolved cases stemmed from misjudgments by either AI or instructors, validating the framework’s efficacy in identifying failure modes and optimizing intervention timing. The system is open-sourced, offering a reusable paradigm for AI-augmented educational support.
📝 Abstract
Timely and high-quality feedback is essential for effective learning in programming courses; yet, providing such support at scale remains a challenge. While AI-based systems offer scalable and immediate help, their responses can occasionally be inaccurate or insufficient. Human instructors, in contrast, may bring more valuable expertise but are limited in time and availability. To address these limitations, we present a hybrid help framework that integrates AI-generated hints with an escalation mechanism, allowing students to request feedback from instructors when AI support falls short. This design leverages the strengths of AI for scale and responsiveness while reserving instructor effort for moments of greatest need. We deployed this tool in a data science programming course with 82 students. We observe that out of the total 673 AI-generated hints, students rated 146 (22%) as unhelpful. Among those, only 16 (11%) of the cases were escalated to the instructors. A qualitative investigation of instructor responses showed that those feedback instances were incorrect or insufficient roughly half of the time. This finding suggests that when AI support fails, even instructors with expertise may need to pay greater attention to avoid making mistakes. We will publicly release the tool for broader adoption and enable further studies in other classrooms. Our work contributes a practical approach to scaling high-quality support and informs future efforts to effectively integrate AI and humans in education.