🤖 AI Summary
To address the bottleneck of manually constructing high-quality software engineering data for Test-Driven Development (TDD) research, this paper proposes SWE-Flow: a framework that initiates from unit tests and employs dynamic program analysis to construct runtime dependency graphs (RDGs), enabling automatic inversion of incremental development steps and generation of structured, verifiable TDD task sequences. Its core innovation is the first “test-driven inversion” paradigm for synthetic data generation, achieving end-to-end modeling from test specifications to developer behavior. We introduce SWE-Flow-Eval, a benchmark derived from real-world GitHub projects (16,061 training and 2,020 test instances), which significantly improves open-source models’ performance on TDD coding tasks. All code, datasets, trained models, and Docker images are publicly released.
📝 Abstract
We introduce **SWE-Flow**, a novel data synthesis framework grounded in Test-Driven Development (TDD). Unlike existing software engineering data that rely on human-submitted issues, **SWE-Flow** automatically infers incremental development steps directly from unit tests, which inherently encapsulate high-level requirements. The core of **SWE-Flow** is the construction of a Runtime Dependency Graph (RDG), which precisely captures function interactions, enabling the generation of a structured, step-by-step *development schedule*. At each step, **SWE-Flow** produces a partial codebase, the corresponding unit tests, and the necessary code modifications, resulting in fully verifiable TDD tasks. With this approach, we generated 16,061 training instances and 2,020 test instances from real-world GitHub projects, creating the **SWE-Flow-Eval** benchmark. Our experiments show that fine-tuning open model on this dataset significantly improves performance in TDD-based coding. To facilitate further research, we release all code, datasets, models, and Docker images at [Github](https://github.com/Hambaobao/SWE-Flow).