SWE-Next: Scalable Real-World Software Engineering Tasks for Agents

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of scaling real-world software engineering tasks due to the scarcity of high-quality, verifiable examples and the high cost of repository environment setup. The authors propose an execution-based verification framework that mines merged pull requests to identify commit pairs that strictly increase test coverage without introducing regressions. Innovatively combining self-verified task filtering with a strategy that reuses repository environments across temporally proximate commits, the method efficiently processes nearly 4,000 repositories within just 30 hours and 639 GB of storage. This yields a high signal-to-noise dataset of 2,308 task instances, which substantially improves the pass@1 performance of downstream models.

Technology Category

Application Category

📝 Abstract

Executable software engineering data is valuable for training SWE agents, but scaling it remains difficult for two reasons: only a small fraction of real repository changes yield verifiable, high-signal task instances, and naively building repository-specific environments quickly becomes the dominant systems cost. We present SWE-Next, an execution-grounded framework for scalable SWE task and trajectory collection. On the data side, SWE-Next mines real merged pull requests, executes candidate base/merged commit pairs, and retains only those that produce strict test improvements without regressions, yielding self-verifying instances. It also applies strict submission gating so that collected trajectories remain evidence-driven rather than speculative. On the systems side, SWE-Next introduces reusable repo-quarter profiles, which reuse the same environment across nearby commits in time while keeping each task run separate and reproducible. Using only 30 hours and 639GB of environment storage, SWE-Next processes 3,971 seed repositories and 102,582 candidate commit pairs mined from real merged PRs to construct a dataset of 2,308 self-verifying instances. Experiments show that SWE-Next improves downstream pass@1 with fewer or comparable training trajectories, indicating that its gains come not from a stronger trajectory generator, but from higher-signal execution-grounded supervision and more efficient data collection.

Problem

Research questions and friction points this paper is trying to address.

software engineering agents

scalable task generation

execution-grounded data

repository environments

self-verifying instances

Innovation

Methods, ideas, or system contributions that make the work stand out.

execution-grounded

self-verifying tasks

scalable data collection