RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing dual-arm robotic synthetic data exhibits poor generalization, primarily due to: (1) the absence of efficient, scalable methods for generating novel-task data; and (2) excessive simplification and severe distortion in simulation environments. This paper introduces RoboTwin—a novel framework addressing these limitations. It constructs RoboTwin-OD, a large-scale real-world object dataset; designs a multimodal large language model–driven code-generation pipeline; and pioneers a five-axis structured domain randomization strategy to significantly enhance cross-scenario generalization. RoboTwin supports over 50 dual-arm manipulation tasks across diverse robot morphologies and generates more than 100,000 expert demonstration trajectories. Evaluation shows that fine-tuned models achieve a 367% performance gain on unseen real-world scenes, zero-shot models improve by 228%, and code-generation success rate increases by 10.9%.

Technology Category

Application Category

📝 Abstract
Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We present RoboTwin 2.0, a scalable simulation framework that enables automated, large-scale generation of diverse and realistic data, along with unified evaluation protocols for dual-arm manipulation. We first construct RoboTwin-OD, a large-scale object library comprising 731 instances across 147 categories, each annotated with semantic and manipulation-relevant labels. Building on this foundation, we develop an expert data synthesis pipeline that combines multimodal large language models (MLLMs) with simulation-in-the-loop refinement to generate task-level execution code automatically. To improve sim-to-real transfer, RoboTwin 2.0 incorporates structured domain randomization along five axes: clutter, lighting, background, tabletop height and language instructions, thereby enhancing data diversity and policy robustness. We instantiate this framework across 50 dual-arm tasks spanning five robot embodiments, and pre-collect over 100,000 domain-randomized expert trajectories. Empirical results show a 10.9% gain in code generation success and improved generalization to novel real-world scenarios. A VLA model fine-tuned on our dataset achieves a 367% relative improvement (42.0% vs. 9.0%) on unseen scene real-world tasks, while zero-shot models trained solely on our synthetic data achieve a 228% relative gain, highlighting strong generalization without real-world supervision. We release the data generator, benchmark, dataset, and code to support scalable research in robust bimanual manipulation.
Problem

Research questions and friction points this paper is trying to address.

Lack of scalable data generation for bimanual robotic tasks
Oversimplified simulations fail to capture real-world complexity
Need robust domain randomization for sim-to-real transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable simulation framework for diverse data
MLLMs with simulation-in-the-loop refinement
Structured domain randomization across five axes