Towards a Neural Debugger for Python

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the limitation of existing neural interpreters in supporting interactive debugging operations—such as breakpoints and step-wise execution—that are essential for emulating developer debugging behaviors. It introduces, for the first time, the concept of a “neural debugger,” which models the evolution of program states under conditional debugging actions by either fine-tuning large language models or training compact models from scratch, leveraging Python execution trace data. The proposed model supports standard debugging commands including setting breakpoints and stepping into, over, or out of functions, while simultaneously enabling both forward output prediction and backward input inference. Evaluated on the CruxEval benchmark, the model demonstrates strong performance in both tasks, confirming its effectiveness and robustness in modeling conditional program execution within an interactive, agent-accessible world model of code execution.

Technology Category

Application Category

📝 Abstract

Training large language models (LLMs) on Python execution traces grounds them in code execution and enables the line-by-line execution prediction of whole Python programs, effectively turning them into neural interpreters (FAIR CodeGen Team et al., 2025). However, developers rarely execute programs step by step; instead, they use debuggers to stop execution at certain breakpoints and step through relevant portions only while inspecting or modifying program variables. Existing neural interpreter approaches lack such interactive control. To address this limitation, we introduce neural debuggers: language models that emulate traditional debuggers, supporting operations such as stepping into, over, or out of functions, as well as setting breakpoints at specific source lines. We show that neural debuggers -- obtained via fine-tuning large LLMs or pre-training smaller models from scratch -- can reliably model both forward execution (predicting future states and outputs) and inverse execution (inferring prior states or inputs) conditioned on debugger actions. Evaluated on CruxEval, our models achieve strong performance on both output and input prediction tasks, demonstrating robust conditional execution modeling. Our work takes first steps towards future agentic coding systems in which neural debuggers serve as a world model for simulated debugging environments, providing execution feedback or enabling agents to interact with real debugging tools. This capability lays the foundation for more powerful code generation, program understanding, and automated debugging.

Problem

Research questions and friction points this paper is trying to address.

neural debugger

interactive debugging

execution trace

program understanding

code generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

neural debugger

interactive debugging

execution trace modeling