🤖 AI Summary
Software engineering (SWE) agents face performance bottlenecks due to scarcity of high-quality training data and insufficient reliable test cases.
Method: This paper introduces an open-source large language model agent framework tailored for realistic SWE tasks. Its core innovations are: (1) a robust pipeline for synthesizing *verification-aware* test cases—ensuring functional correctness and behavioral fidelity; and (2) a scalable data construction methodology integrating trajectory distillation, tool-augmented reasoning, and synthetic-test-driven reinforcement learning to generate high-quality agent trajectories.
Contribution/Results: Evaluated on the SWE-bench-Verified benchmark, our released models—SWE-Dev 7B and SWE-Dev 32B—achieve success rates of 23.4% and 36.6%, respectively, setting new state-of-the-art results among open-source SWE agents. All code, model weights, and training data are publicly released to foster reproducibility and community advancement.
📝 Abstract
Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use, such as software engineering (SWE). Recent LLM-powered toolkits, such as OpenAI Codex and Cursor, have offered end-to-end automation of the software development process. However, building effective SWE agents remains challenging due to the lack of high-quality training data and effective test cases. To address this issue, we present SWE-Dev, an SWE agent built upon open-source LLMs. First, we develop a robust pipeline to synthesize test cases for patch evaluation. Second, we scale up agent trajectories to construct the training data for building SWE-Dev. Experiments on the SWE-bench-Verified benchmark show that the SWE-Dev models can achieve top performance among all open SWE agents. Specifically, the success rates of the SWE-Dev 7B and 32B parameter models reach 23.4% and 36.6%, respectively, outperforming state-of-the-art open-source models. All code, models, and datasets are publicly available at https://github.com/THUDM/SWE-Dev.