🤖 AI Summary
This work addresses the unclear individual contributions of diverse contextual signals leveraged by current agents in automated software engineering (SWE). The authors propose Oracle-SWE, a framework that systematically disentangles and quantifies the upper-bound impact of idealized (oracle) information signals on agent performance using established SWE benchmarks. Furthermore, they evaluate the practical gains achieved by approximating these oracle signals through strong language models. Through comprehensive ablation studies and performance analysis, the study not only elucidates the critical roles of distinct signal types but also demonstrates that model-extracted approximations can substantially enhance the problem-solving capabilities of baseline agents. These findings provide empirical grounding for the design of more effective autonomous programming systems.
📝 Abstract
Recent advances in language model (LM) agents have significantly improved automated software engineering (SWE). Prior work has proposed various agentic workflows and training strategies as well as analyzed failure modes of agentic systems on SWE tasks, focusing on several contextual information signals: Reproduction Test, Regression Test, Edit Location, Execution Context, and API Usage. However, the individual contribution of each signal to overall success remains underexplored, particularly their ideal contribution when intermediate information is perfectly obtained. To address this gap, we introduce Oracle-SWE, a unified method to isolate and extract oracle information signals from SWE benchmarks and quantify the impact of each signal on agent performance. To further validate the pattern, we evaluate the performance gain of signals extracted by strong LMs when provided to a base agent, approximating real-world task-resolution settings. These evaluations aim to guide research prioritization for autonomous coding systems.