Immersion in the GitHub Universe: Scaling Coding Agents to Mastery

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the performance limitations of coding agents in real-world software engineering tasks, which stem from the scarcity of high-quality training data. To overcome this challenge, the authors propose ScaleSWE—a sandbox-based, multi-agent automated workflow that jointly orchestrates environment setup, unit test generation, and problem description synthesis to construct the first large-scale, highly diverse, and realistically complex software engineering dataset derived from six million GitHub pull requests. Requiring no human annotation, this approach substantially surpasses existing real-world datasets in both scale and fidelity. The resulting ScaleSWE dataset comprises 100,000 verified instances and is used to fine-tune the Qwen-30B-A3B-Instruct model, achieving a 64% resolution rate on SWE-Bench Verified—nearly tripling the performance of the baseline.

Technology Category

Application Category

📝 Abstract

Achieving mastery in real world software engineering tasks is fundamentally bottlenecked by the scarcity of large scale, high quality training data. Scaling such data has been limited by the complexity of environment setup, unit test generation, and problem statement curation. In this paper, we propose ScaleSWE, an automated, sandboxed multi agent workflow designed to construct high quality SWE data at scale. The system coordinates three specialized agents for environment setup, test creation, and problem description synthesis to process 6 million pull requests across 5200 repositories, producing Scale SWE Data: 100k verified SWE instances, the largest such dataset to date. It substantially surpasses existing real world datasets in repository diversity and reflects realistic task complexity. We further demonstrate the dataset utility for training by distilling 71498 high quality trajectories and finetuning Qwen30BA3BInstruct to produce ScaleSWE Agent. Our agent achieves a 64 resolve rate on SWE Bench Verified a nearly three fold improvement over the base model. ScaleSWE provides a scalable, reproducible approach for data construction to advance LLM based software engineering. Scale SWE will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

software engineering

training data scarcity

environment setup

unit test generation

problem statement curation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent workflow

large-scale SWE dataset

automated data curation