🤖 AI Summary
Dual-arm manipulation has long suffered from data scarcity—due to limited real-world expert demonstrations—and the simulation-to-reality (Sim2Real) gap, resulting in persistent bottlenecks in both data acquisition and evaluation. To address this, we propose RoboTwin, a generative digital twin benchmark for dual-arm manipulation. RoboTwin introduces the first end-to-end framework integrating 3D generative foundation models with large language models (LLMs), enabling automatic generation of interactive simulation scenes and spatially aware robot motion code from a single 2D image. By jointly modeling object spatial relationships, constructing high-fidelity digital twins, and leveraging control-policy transfer learning, RoboTwin achieves efficient Sim2Real alignment. Evaluated on the COBOT Magic platform, policies pre-trained on RoboTwin and fine-tuned with only a small number of real-world samples achieve over 70% success rate improvement on single-arm tasks and over 40% on dual-arm tasks.
📝 Abstract
In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. However, the scarcity of diverse, high-quality demonstration data and real-world-aligned evaluation benchmarks severely limits such development. To address this, we introduce RoboTwin, a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets and provide a real-world-aligned evaluation platform for dual-arm robotic tasks. Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios. It also introduces a spatial relation-aware code generation framework that combines object annotations with large language models to break down tasks, determine spatial constraints, and generate precise robotic movement code. Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance. We validated our approach using the open-source COBOT Magic Robot platform. Policies pre-trained on RoboTwin-generated data and fine-tuned with limited real-world samples improve the success rate of over 70% for single-arm tasks and over 40% for dual-arm tasks compared to models trained solely on real-world data. This significant improvement demonstrates RoboTwin's potential to enhance the development and evaluation of dual-arm robotic manipulation systems. Project Page: https://robotwin-benchmark.github.io/early-version/.