🤖 AI Summary
This work addresses the limitations of existing code-editing interfaces, which tightly couple viewing, planning, and execution, leading to redundant context and degraded agent performance. To overcome this, the authors propose a decoupled dual-agent architecture: a Viewer agent retrieves relevant code on demand, while an Editor agent performs modifications based on high-level plans, enabling the primary agent to focus on reasoning and offloading subtasks to clean, focused contexts. The study further introduces a novel multimodal adaptive editing mechanism that replaces error-prone fixed-format editing strategies and establishes a new code-editing benchmark capable of predicting downstream task performance. Trained with Qwen3-8B and the GRPO algorithm, the proposed approach achieves a 2.1% absolute improvement in resolution rate on SWE-bench Verified and reduces inference costs by 17.9%.
📝 Abstract
Large language model agents have achieved remarkable progress on software engineering tasks, yet current approaches suffer from a fundamental context coupling problem: the standard code editing interface conflates code inspection, modification planning, and edit execution within a single context window, forcing agents to interleave exploratory viewing with strictly formatted edit generation. This causes irrelevant information to accumulate and degrades agent performance. To address this, we propose SWE-Edit, which decomposes code editing into two specialized subagents: a Viewer that extracts task-relevant code on demand, and an Editor that executes modifications from high-level plans--allowing the main agent to focus on reasoning while delegating context-intensive operations to clean context windows. We further investigate what makes an effective editing model: observing that the prevalent find-and-replace format is error-prone, we train Qwen3-8B with GRPO to adaptively select editing modes, yielding improved editing efficiency over single-format baselines. On SWE-bench Verified, SWE-Edit improves resolved rate by 2.1% while reducing inference cost by 17.9%. We additionally propose a code editing benchmark that reliably predicts downstream agentic performance, providing practical guidance for editing model selection. Our code is publicly available at https://github.com/microsoft/SWE-Edit.