daVinci-Env: Open SWE Environment Synthesis at Scale

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of large-scale, executable, verifiable, and open-source training environments for software engineering agents, which has hindered progress in the field. To bridge this gap, we introduce OpenSWE—the first large-scale, open-source framework for training Python-based software engineering agents—encompassing 12.8k code repositories and 45,320 executable Docker environments. Leveraging a multi-agent synthetic pipeline deployed across a 64-node cluster, we automatically generate environments and evaluation scripts, enhanced by a quality-driven, difficulty-aware filtering mechanism that yields 9,000 high-quality environments and 13,000 curated agent trajectories. Models trained on this framework, OpenSWE-32B and OpenSWE-72B, achieve state-of-the-art results on SWE-bench Verified with scores of 62.4% and 66.0%, respectively, surpassing the Qwen2.5 series and demonstrating significantly improved mathematical and scientific reasoning capabilities.

Technology Category

Application Category

📝 Abstract
Training capable software engineering (SWE) agents demands large-scale, executable, and verifiable environments that provide dynamic feedback loops for iterative code editing, test execution, and solution refinement. However, existing open-source datasets remain limited in scale and repository diversity, while industrial solutions are opaque with unreleased infrastructure, creating a prohibitive barrier for most academic research groups. We present OpenSWE, the largest fully transparent framework for SWE agent training in Python, comprising 45,320 executable Docker environments spanning over 12.8k repositories, with all Dockerfiles, evaluation scripts, and infrastructure fully open-sourced for reproducibility. OpenSWE is built through a multi-agent synthesis pipeline deployed across a 64-node distributed cluster, automating repository exploration, Dockerfile construction, evaluation script generation, and iterative test analysis. Beyond scale, we propose a quality-centric filtering pipeline that characterizes the inherent difficulty of each environment, filtering out instances that are either unsolvable or insufficiently challenging and retaining only those that maximize learning efficiency. With $891K spent on environment construction and an additional $576K on trajectory sampling and difficulty-aware curation, the entire project represents a total investment of approximately $1.47 million, yielding about 13,000 curated trajectories from roughly 9,000 quality guaranteed environments. Extensive experiments validate OpenSWE's effectiveness: OpenSWE-32B and OpenSWE-72B achieve 62.4% and 66.0% on SWE-bench Verified, establishing SOTA among Qwen2.5 series. Moreover, SWE-focused training yields substantial out-of-domain improvements, including up to 12 points on mathematical reasoning and 5 points on science benchmarks, without degrading factual recall.
Problem

Research questions and friction points this paper is trying to address.

software engineering agents
executable environments
open-source datasets
environment synthesis
reproducibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

SWE agent
environment synthesis
multi-agent pipeline
difficulty-aware curation
executable Docker environments
🔎 Similar Papers
No similar papers found.