PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Manual grading of STEM coursework exhibits prolonged turnaround times (averaging seven days), resulting in delayed feedback, hindered student iteration, and diminished learning outcomes. To address critical limitations of existing AI-powered educational tools—including inadequate privacy safeguards, lack of model transparency, poor support for multi-format submissions (e.g., Markdown, LaTeX, Python), and low instructor engagement—this paper introduces the first automated Jupyter Notebook grading framework integrating lightweight, locally deployable open-source LLMs (e.g., Phi-3, Llama-3) with programmable unit testing. Built upon the Jupyter API and secure sandboxed execution, the framework enables fully local, end-to-end deployment—ensuring institutional data remains on-premises and granting instructors full oversight and control. Empirical evaluation in a numerical computing course demonstrates sub-second feedback latency, over threefold improvement in instructor grading efficiency, and a 42% increase in student resubmission rates, validating both technical efficacy and pedagogical utility.

Technology Category

Application Category

📝 Abstract

Grading student assignments in STEM courses is a laborious and repetitive task for tutors, often requiring a week to assess an entire class. For students, this delay of feedback prevents iterating on incorrect solutions, hampers learning, and increases stress when exercise scores determine admission to the final exam. Recent advances in AI-assisted education, such as automated grading and tutoring systems, aim to address these challenges by providing immediate feedback and reducing grading workload. However, existing solutions often fall short due to privacy concerns, reliance on proprietary closed-source models, lack of support for combining Markdown, LaTeX and Python code, or excluding course tutors from the grading process. To overcome these limitations, we introduce PyEvalAI, an AI-assisted evaluation system, which automatically scores Jupyter notebooks using a combination of unit tests and a locally hosted language model to preserve privacy. Our approach is free, open-source, and ensures tutors maintain full control over the grading process. A case study demonstrates its effectiveness in improving feedback speed and grading efficiency for exercises in a university-level course on numerics.

Problem

Research questions and friction points this paper is trying to address.

Automated grading of Jupyter Notebooks

Immediate personalized feedback for students

Privacy-preserving AI-assisted evaluation system

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-assisted evaluation system

Unit tests and local model

Open-source and privacy-preserving

🔎 Similar Papers

No similar papers found.

Authors to Follow