Automated Program Repair of Uncompilable Student Code

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

A substantial proportion of novice programming submissions fail to compile, forcing traditional student modeling and knowledge tracing approaches to discard such data—thereby losing critical insights into learners’ developmental processes. Method: We propose a large language model (LLM)-based compilability repair framework that systematically evaluates GPT-5, Claude 3.5 Haiku, and Gemini 2.5 Flash across high- and low-context settings, with emphasis on control-flow preservation and syntactic-structural consistency. Our method integrates compiler feedback–driven iterative repair, edit-distance constraints, and abstract syntax tree (AST)-level structural validation to ensure repaired code both compiles successfully and faithfully retains the student’s original logic and coding style. Contribution/Results: Experiments demonstrate significant improvements in repair success rates and pedagogical utility for non-compilable submissions. The framework delivers high-fidelity, low-distortion input data, enabling fine-grained learning process modeling and robust knowledge tracing.

Technology Category

Application Category

📝 Abstract

A significant portion of student programming submissions in CS1 learning environments are uncompilable, limiting their use in student modeling and downstream knowledge tracing. Traditional modeling pipelines often exclude these cases, discarding observations of student learning. This study investigates automated program repair as a strategy to recover uncompilable code while preserving students'structural intent for use in student modeling. Within this framework, we assess large language models (LLMs) as repair agents, including GPT-5 (OpenAI), Claude 3.5 Haiku (Anthropic), and Gemini 2.5 Flash (Google), under high- and low-context prompting conditions. Repairs were evaluated for compilability, edit distance, and preservation of students'original structure and logic. We find that while all three LLMs are capable of producing compilable repairs, their behavior diverges in how well they preserve students'control flow and code structure, which affects their pedagogical utility. By recovering uncompilable submissions, this work enables richer and more comprehensive analyses of learners'coding processes and development over time.

Problem

Research questions and friction points this paper is trying to address.

Repairing uncompilable student code submissions

Preserving structural intent for student modeling

Evaluating LLMs' pedagogical utility in code repair

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using large language models for code repair

Preserving student code structure during repair

Evaluating repairs for compilability and logic preservation

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?