Pull Requests as a Training Signal for Repo-Level Code Editing

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge that models struggle to comprehend complex dependencies and accurately perform multi-file edits in warehouse-scale code editing tasks. To this end, the authors propose Clean-PR, a novel approach that systematically converts real-world GitHub pull requests into high-quality, scalable, and structured Search/Replace edit training data for the first time. By leveraging intermediate training followed by agent-free supervised fine-tuning, the method enables models to internalize sophisticated editing capabilities without relying on agent-based frameworks during inference. Evaluated on SWE-bench Lite and Verified, Clean-PR achieves absolute performance improvements of 13.6% and 12.3%, respectively, substantially outperforming instruction fine-tuning baselines.

Technology Category

Application Category

📝 Abstract

Repository-level code editing requires models to understand complex dependencies and execute precise multi-file modifications across a large codebase. While recent gains on SWE-bench rely heavily on complex agent scaffolding, it remains unclear how much of this capability can be internalised via high-quality training signals. To address this, we propose Clean Pull Request (Clean-PR), a mid-training paradigm that leverages real-world GitHub pull requests as a training signal for repository-level editing. We introduce a scalable pipeline that converts noisy pull request diffs into Search/Replace edit blocks through reconstruction and validation, resulting in the largest publicly available corpus of 2 million pull requests spanning 12 programming languages. Using this training signal, we perform a mid-training stage followed by an agentless-aligned supervised fine-tuning process with error-driven data augmentation. On SWE-bench, our model significantly outperforms the instruction-tuned baseline, achieving absolute improvements of 13.6% on SWE-bench Lite and 12.3% on SWE-bench Verified. These results demonstrate that repository-level code understanding and editing capabilities can be effectively internalised into model weights under a simplified, agentless protocol, without relying on heavy inference-time scaffolding.

Problem

Research questions and friction points this paper is trying to address.

repository-level code editing

pull requests

code understanding

multi-file modifications

training signal

Innovation

Methods, ideas, or system contributions that make the work stand out.

repository-level code editing

pull request as training signal

mid-training