SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of scaling and rigorously validating training data for real-world code problems, this paper introduces SWE-Mirror: the first framework enabling semantic migration of authentic GitHub issues into existing Gym-based environment repositories. Leveraging cross-repository mirroring, semantic issue extraction, and task regeneration, SWE-Mirror automatically constructs executable, verifiable tasks across 40 open-source repositories in 4 programming languages. This approach overcomes two key bottlenecks—namely, the lack of real-world grounding in synthetic tasks and the prohibitive cost of manual task curation. The resulting dataset comprises 60,671 high-quality, execution-verified tasks. Empirically, fine-tuning Qwen2.5-Coder-Instruct (7B/32B) on this data yields substantial performance gains on SWE-Bench-Verified, improving pass rates by 21.8% and 46.0%, respectively—achieving new state-of-the-art (SOTA) results.

Technology Category

Application Category

📝 Abstract
Creating large-scale verifiable training datasets for issue-resolving tasks is a critical yet notoriously difficult challenge. Existing methods on automating the Gym environment setup process for real-world issues suffer from low success rates and high overhead. Meanwhile, synthesizing new tasks within existing Gym environments leaves the vast pool of authentic, human-reported problems untapped. To maximize the utilization of existing Gym environments and also the rich data of issue-resolving history on GitHub, we introduce SWE-Mirror, a pipeline that distills a real-world issue's semantic essence, mirrors it into another repository with a configured Gym environment, and re-animates it as a verifiable issue-resolving task. SWE-Mirror reuses existing Gym environments along with the vast pool of issue-resolving history hosted on GitHub to construct a large-scale dataset of mirrored authentic and verifiable tasks. Applying SWE-Mirror to 40 repositories across 4 languages, we have curated a dataset with 60,671 issue-resolving tasks and demonstrated the value of our dataset by training and evaluating coding agents at various scale. Post-training experiments show that models trained with the dataset exhibit improvements in issue-resolving capabilities. Furthermore, by extending the dataset size to over 12,000 high-quality trajectories, we established a new state-of-the-art (SOTA) among Qwen2.5-Coder-Instruct based LLMs on the OpenHands agent framework, which increases the resolve rate on SWE-Bench-Verified by +21.8% for the 7B model and +46.0% for the 32B model and validates the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Creating large-scale verifiable datasets for issue-resolving tasks
Low success rates in automating Gym environment setup processes
Maximizing utilization of existing Gym environments and GitHub issue data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mirrors issues across repositories to reuse environments
Distills semantic essence to create verifiable tasks
Leverages GitHub history for large-scale dataset construction
🔎 Similar Papers
No similar papers found.
J
Junhao Wang
ByteDance Seed, The Chinese University of Hong Kong
Daoguang Zan
Daoguang Zan
ByteDance Seed
Large Language ModelSoftware EngineeringCoding Agent
S
Shulin Xin
ByteDance Seed
S
Siyao Liu
ByteDance Seed
Y
Yurong Wu
ByteDance Seed
Kai Shen
Kai Shen
Associate Professor of Computer Science, University of Rochester
Computer Systems