SWE-Hub: A Unified Production System for Scalable, Executable Software Engineering Tasks

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses critical limitations in current research on software engineering agents, which suffers from a lack of executable, scalable, and realistic training and evaluation data—particularly in environment reproducibility, system-level bug synthesis, and long-horizon task modeling. To overcome these challenges, we propose SWE-Hub, an end-to-end unified production system featuring three key innovations: Env Agent for reproducible, multi-language containerized environments; Bug Agent for generating cross-module system-level regression tasks; and SWE-Architect for translating natural language requirements into repository-scale build tasks. By integrating automated environment setup, defect synthesis, and task generation, SWE-Hub establishes a high-throughput, high-fidelity pipeline that spans multiple programming languages and the full software development lifecycle, substantially enhancing the scale, realism, and diversity of executable software engineering tasks.

Technology Category

Application Category

📝 Abstract
Progress in software-engineering agents is increasingly constrained by the scarcity of executable, scalable, and realistic data for training and evaluation. This scarcity stems from three fundamental challenges in existing pipelines: environments are brittle and difficult to reproduce across languages; synthesizing realistic, system-level bugs at scale is computationally expensive; and existing data predominantly consists of short-horizon repairs, failing to capture long-horizon competencies like architectural consistency. We introduce \textbf{SWE-Hub}, an end-to-end system that operationalizes the data factory abstraction by unifying environment automation, scalable synthesis, and diverse task generation into a coherent production stack. At its foundation, the \textbf{Env Agent} establishes a shared execution substrate by automatically converting raw repository snapshots into reproducible, multi-language container environments with standardized interfaces. Built upon this substrate, \textbf{SWE-Scale} engine addresses the need for high-throughput generation, combining cross-language code analysis with cluster-scale validation to synthesize massive volumes of localized bug-fix instances. \textbf{Bug Agent} generates high-fidelity repair tasks by synthesizing system-level regressions involving cross-module dependencies, paired with user-like issue reports that describe observable symptoms rather than root causes. Finally, \textbf{SWE-Architect} expands the task scope from repair to creation by translating natural-language requirements into repository-scale build-a-repo tasks. By integrating these components, SWE-Hub establishes a unified production pipeline capable of continuously delivering executable tasks across the entire software engineering lifecycle.
Problem

Research questions and friction points this paper is trying to address.

executable data
scalable synthesis
software engineering agents
long-horizon tasks
reproducible environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

SWE-Hub
executable software engineering tasks
reproducible multi-language environments
scalable bug synthesis
repository-scale task generation
🔎 Similar Papers
No similar papers found.
Y
Yucheng Zeng
Qianfan Team, Baidu Inc.
S
Shupeng Li
Qianfan Team, Baidu Inc.
Daxiang Dong
Daxiang Dong
Baidu
Deep Learning、Natural Language Processing、Data Mining
Ruijie Xu
Ruijie Xu
ShanghaiTech University
Machine LearningComputer VisionRLHF
Z
Zimo Chen
Qianfan Team, Baidu Inc.
L
Liwei Zheng
Qianfan Team, Baidu Inc.
Y
Yuxuan Li
Qianfan Team, Baidu Inc.
Z
Zhe Zhou
Qianfan Team, Baidu Inc.
H
Haotian Zhao
Qianfan Team, Baidu Inc.
L
Lun Tian
Qianfan Team, Baidu Inc.
Heng Xiao
Heng Xiao
Professor, University of Stuttgart
uncertainty quantificationcomputational fluid dynamicsscientific machine learning
T
Tianshu Zhu
Qianfan Team, Baidu Inc.
L
Longkun Hao
Qianfan Team, Baidu Inc.
J
Jianmin Wu
Qianfan Team, Baidu Inc.