Immersion in the GitHub Universe: Scaling Coding Agents to Mastery

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance limitations of coding agents in real-world software engineering tasks, which stem from the scarcity of high-quality training data. To overcome this challenge, the authors propose ScaleSWE—a sandbox-based, multi-agent automated workflow that jointly orchestrates environment setup, unit test generation, and problem description synthesis to construct the first large-scale, highly diverse, and realistically complex software engineering dataset derived from six million GitHub pull requests. Requiring no human annotation, this approach substantially surpasses existing real-world datasets in both scale and fidelity. The resulting ScaleSWE dataset comprises 100,000 verified instances and is used to fine-tune the Qwen-30B-A3B-Instruct model, achieving a 64% resolution rate on SWE-Bench Verified—nearly tripling the performance of the baseline.

Technology Category

Application Category

📝 Abstract
Achieving mastery in real world software engineering tasks is fundamentally bottlenecked by the scarcity of large scale, high quality training data. Scaling such data has been limited by the complexity of environment setup, unit test generation, and problem statement curation. In this paper, we propose ScaleSWE, an automated, sandboxed multi agent workflow designed to construct high quality SWE data at scale. The system coordinates three specialized agents for environment setup, test creation, and problem description synthesis to process 6 million pull requests across 5200 repositories, producing Scale SWE Data: 100k verified SWE instances, the largest such dataset to date. It substantially surpasses existing real world datasets in repository diversity and reflects realistic task complexity. We further demonstrate the dataset utility for training by distilling 71498 high quality trajectories and finetuning Qwen30BA3BInstruct to produce ScaleSWE Agent. Our agent achieves a 64 resolve rate on SWE Bench Verified a nearly three fold improvement over the base model. ScaleSWE provides a scalable, reproducible approach for data construction to advance LLM based software engineering. Scale SWE will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

software engineering
training data scarcity
environment setup
unit test generation
problem statement curation
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent workflow
large-scale SWE dataset
automated data curation
LLM-based software engineering
sandboxed environment
🔎 Similar Papers
No similar papers found.
J
Jiale Zhao
Gaoling School of Artificial Intelligence, Renmin University of China
G
Guoxin Chen
Gaoling School of Artificial Intelligence, Renmin University of China
F
Fanzhe Meng
Gaoling School of Artificial Intelligence, Renmin University of China
Minghao Li
Minghao Li
Beihang University
Natural Language Processing
Jie Chen
Jie Chen
Renmin University of China
Large Language ModelsNatural Language ProcessingReinforcement LearningPre-training
H
Hui Xu
Gaoling School of Artificial Intelligence, Renmin University of China
Y
Yongshuai Sun
Gaoling School of Artificial Intelligence, Renmin University of China
X
Xin Zhao
BandAI, ByteDance
Ruihua Song
Ruihua Song
Renmin University of China
AI based creationmulti-modaltiy chitchatnatural language understandinginformation retrievalinformation extraction
Yuan Zhang
Yuan Zhang
Professor, School of Computer Science, Fudan University
systems and softaware security
Peng Wang
Peng Wang
Renmin University of China
3D Perception
Cheng Chen
Cheng Chen
East China Normal University
Online LearningOptimizationNumerical Linear Algebra
J
Jirong Wen
Gaoling School of Artificial Intelligence, Renmin University of China
Kai Jia
Kai Jia
MIT