PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback

📅 2025-02-25
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Manual grading of STEM coursework exhibits prolonged turnaround times (averaging seven days), resulting in delayed feedback, hindered student iteration, and diminished learning outcomes. To address critical limitations of existing AI-powered educational tools—including inadequate privacy safeguards, lack of model transparency, poor support for multi-format submissions (e.g., Markdown, LaTeX, Python), and low instructor engagement—this paper introduces the first automated Jupyter Notebook grading framework integrating lightweight, locally deployable open-source LLMs (e.g., Phi-3, Llama-3) with programmable unit testing. Built upon the Jupyter API and secure sandboxed execution, the framework enables fully local, end-to-end deployment—ensuring institutional data remains on-premises and granting instructors full oversight and control. Empirical evaluation in a numerical computing course demonstrates sub-second feedback latency, over threefold improvement in instructor grading efficiency, and a 42% increase in student resubmission rates, validating both technical efficacy and pedagogical utility.

Technology Category

Application Category

📝 Abstract
Grading student assignments in STEM courses is a laborious and repetitive task for tutors, often requiring a week to assess an entire class. For students, this delay of feedback prevents iterating on incorrect solutions, hampers learning, and increases stress when exercise scores determine admission to the final exam. Recent advances in AI-assisted education, such as automated grading and tutoring systems, aim to address these challenges by providing immediate feedback and reducing grading workload. However, existing solutions often fall short due to privacy concerns, reliance on proprietary closed-source models, lack of support for combining Markdown, LaTeX and Python code, or excluding course tutors from the grading process. To overcome these limitations, we introduce PyEvalAI, an AI-assisted evaluation system, which automatically scores Jupyter notebooks using a combination of unit tests and a locally hosted language model to preserve privacy. Our approach is free, open-source, and ensures tutors maintain full control over the grading process. A case study demonstrates its effectiveness in improving feedback speed and grading efficiency for exercises in a university-level course on numerics.
Problem

Research questions and friction points this paper is trying to address.

Automated grading of Jupyter Notebooks
Immediate personalized feedback for students
Privacy-preserving AI-assisted evaluation system
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-assisted evaluation system
Unit tests and local model
Open-source and privacy-preserving
🔎 Similar Papers
No similar papers found.
N
Nils Wandel
Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn NRW, Germany
D
David Stotko
Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn NRW, Germany
A
Alexander Schier
Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn NRW, Germany
Reinhard Klein
Reinhard Klein
Professor of Computer Science, Bonn University
Computer GraphicsMaterial AppearanceRenderingGeometry ProcessingArchitectural Geometry