🤖 AI Summary
This work addresses the inefficiency and error-proneness of existing automated program repair methods in large codebases, which often neglect historical successful repairs, leading to redundant reasoning. The authors propose a result-conditioned backward reasoning distillation mechanism that, for the first time, reconstructs stepwise repair trajectories from verified patch outcomes without requiring fine-tuning or online search. This approach extracts transferable reasoning logic to guide fault localization and patch generation for new issues. Integrated with large language models, the method enables repair trajectory reconstruction, reasoning distillation, and file- or function-level localization and patch synthesis. Evaluated on SWE-Bench Lite, it significantly improves repair success rates, yielding absolute gains of 10.4%, 8.6%, and 10.3% with GPT-4o, DeepSeek-V3, and GPT-5, respectively.
📝 Abstract
Software issue resolution in large repositories is a long-range decision process: choices made during localization shape the space of viable edits, and missteps can compound into incorrect patches. Despite this, many LLM-based repair pipelines still operate in a reset-and-solve manner, producing fresh reasoning for every new issue instead of carrying forward what worked in past fixes. This is wasteful because repositories routinely contain earlier issues with overlapping structure, failure modes, or constraints, where prior repair experience could provide useful guidance. Existing approaches typically harvest this signal through forward-time trial procedures, such as repeated refinement or search, incurring high inference cost while still risking divergence from the eventual correct patch. We present an Outcome-Conditioned Reasoning Distillation(O-CRD) framework that uses resolved in-repository issues with verified patches as supervision. Starting from a historical fix, the method reconstructs a stage-wise repair trace backward from the verified outcome, then reuses the distilled guidance at inference time to steer file/function localization and patch synthesis, without fine-tuning or online search. On SWE-Bench Lite, this approach increases Pass@1 by 10.4% with GPT-4o, 8.6% with DeepSeek-V3, and 10.3% with GPT-5, indicating that outcome-conditioned reuse of verified repairs can replace costly forward exploration for software issue resolution.