🤖 AI Summary
Existing robot manipulation policies suffer from limited generalization due to reliance on small-scale, low-diversity simulation data or single-environment real-world datasets. To address this, we introduce DROID—the first large-scale, cross-household, real-world distributed robot manipulation dataset. It encompasses 564 diverse household environments, 84 task categories, and 76k high-quality trajectories (350 hours), collected over 12 months by 50 geographically distributed contributors. DROID pioneers intercontinental, multi-brand robotic hardware coordination (UR5e and Franka Emika arms) via remote distributed data collection, integrating standardized interfaces, precise action alignment, and rigorous quality filtering. We fully open-source the hardware specifications, data collection infrastructure, and training code. Policies trained on DROID achieve a 27% average success rate improvement in cross-scene generalization benchmarks and demonstrate superior zero-shot transfer performance compared to prior state-of-the-art methods.
📝 Abstract
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.