debug-gym: A Text-Based Environment for Interactive Debugging

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) for code debugging rely heavily on static context and lack mechanisms to dynamically retrieve relevant information from codebases. Method: We propose a novel interactive information retrieval paradigm, implemented in debug-gym—a lightweight, text-based, extensible debugging environment integrating pdb, terminal simulation, and a command-line interface. It explicitly models the action space and feedback mechanisms of LLM agents. Contribution/Results: This work is the first to systematically incorporate interactive exploration into LLM programming agent training, supporting general information-seeking tasks. Experiments demonstrate significant improvements in root-cause localization and repair accuracy, validating the critical benefit of dynamic information acquisition for complex coding tasks. The framework establishes a new benchmark and foundational infrastructure for LLM-driven intelligent debugging.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly relied upon for coding tasks, yet in most scenarios it is assumed that all relevant information can be either accessed in context or matches their training data. We posit that LLMs can benefit from the ability to interactively explore a codebase to gather the information relevant to their task. To achieve this, we present a textual environment, namely debug-gym, for developing LLM-based agents in an interactive coding setting. Our environment is lightweight and provides a preset of useful tools, such as a Python debugger (pdb), designed to facilitate an LLM-based agent's interactive debugging. Beyond coding and debugging tasks, this approach can be generalized to other tasks that would benefit from information-seeking behavior by an LLM agent.
Problem

Research questions and friction points this paper is trying to address.

Enabling LLMs to interactively explore codebases for task-relevant information
Providing a lightweight textual environment for LLM-based debugging agents
Generalizing interactive information-seeking behavior beyond coding tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive textual environment for LLM debugging
Includes Python debugger (pdb) for agent support
Generalizable to LLM information-seeking tasks
🔎 Similar Papers
No similar papers found.
Xingdi Yuan
Xingdi Yuan
Microsoft Research, Montreal
Natural Language ProcessingInteractive Language LearningText-based GamesCoding Agent
M
Morgane M Moss
Microsoft Research Montréal, Mila, Université de Montréal
C
Charbel Feghali
McGill University
C
Chinmay Singh
Microsoft Research NYC
D
Darya Moldavskaya
Microsoft Research NYC
D
Drew MacPhee
Microsoft Research Montréal
Lucas Caccia
Lucas Caccia
Microsoft Research
Deep LearningContinual LearningNatural Language Processing
M
Matheus Pereira
Microsoft Research Montréal
Minseon Kim
Minseon Kim
Microsoft Research
AI SafetyRobustnessRepresentation learning
Alessandro Sordoni
Alessandro Sordoni
Microsoft Research
Artificial IntelligenceInformation RetrievalDeep Learning
M
Marc-Alexandre Coté
Microsoft Research Montréal