🤖 AI Summary
To address the scarcity of real-world demonstration data and the substantial sim-to-real gap in bimanual robot coordination, this paper introduces RoboTwin—a generative digital twin benchmark for bimanual manipulation. We propose the first digital twin framework integrating 3D generative foundation models with large language models (LLMs), coupled with a spatial-relation-aware code generation method that enables end-to-end synthesis of interactive simulation scenes and bimanual motion control code directly from a single 2D image. Evaluated on the COBOT Magic platform, RoboTwin—pretrained on simulation data and fine-tuned with only a few real-world samples—achieves over 70% improvement in success rate for unimanual tasks and over 40% for bimanual tasks. This work establishes the first vision-simulation-action closed-loop benchmark and generative paradigm for bimanual manipulation, advancing embodied intelligence.
📝 Abstract
In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. However, the scarcity of diverse, high-quality demonstration data and real-world-aligned evaluation benchmarks severely limits such development. To address this, we introduce RoboTwin, a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets and provide a real-world-aligned evaluation platform for dual-arm robotic tasks. Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios. It also introduces a spatial relation-aware code generation framework that combines object annotations with large language models to break down tasks, determine spatial constraints, and generate precise robotic movement code. Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance. We validated our approach using the open-source COBOT Magic Robot platform. Policies pre-trained on RoboTwin-generated data and fine-tuned with limited real-world samples demonstrate significant potential for enhancing dual-arm robotic manipulation systems by improving success rates by over 70% for single-arm tasks and over 40% for dual-arm tasks compared to models trained solely on real-world data.