RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

📅 2024-12-18
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of large-scale, high-quality, and diverse manipulation datasets for multimodal robot imitation learning, this work introduces the first benchmark dataset collected on a unified platform: it encompasses 96 object categories, 479 distinct tasks, 107k successful demonstrations, and 5k failure cases annotated with causal attributions, accompanied by an Isaac Sim digital twin environment. We propose a standardized teleoperation paradigm compatible with four distinct robotic platforms—Panda, UR5e, AgileX dual-arm, and humanoid dual-dexterous-hand—integrating multi-view vision, proprioceptive encoding, natural language task descriptions, and failure attribution labels. A digital twin–enabled closed-loop evaluation framework is established. The dataset enables robust Vision-Language-Action (VLA) model training, yielding significant improvements in cross-task generalization and manipulation success rates. To date, it represents the largest consistent benchmark for multimodal robot imitation learning.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot Manipulation), a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view observations, proprioceptive robot state information, and linguistic task descriptions. To ensure data consistency and reliability for imitation learning, RoboMIND is built on a unified data collection platform and a standardized protocol, covering four distinct robotic embodiments: the Franka Emika Panda, the UR5e, the AgileX dual-arm robot, and a humanoid robot with dual dexterous hands. Our dataset also includes 5k real-world failure demonstrations, each accompanied by detailed causes, enabling failure reflection and correction during policy learning. Additionally, we created a digital twin environment in the Isaac Sim simulator, replicating the real-world tasks and assets, which facilitates the low-cost collection of additional training data and enables efficient evaluation. To demonstrate the quality and diversity of our dataset, we conducted extensive experiments using various imitation learning methods for single-task settings and state-of-the-art Vision-Language-Action (VLA) models for multi-task scenarios. By leveraging RoboMIND, the VLA models achieved high manipulation success rates and demonstrated strong generalization capabilities. To the best of our knowledge, RoboMIND is the largest multi-embodiment teleoperation dataset collected on a unified platform, providing large-scale and high-quality robotic training data. Our project is at https://x-humanoid-robomind.github.io/.
Problem

Research questions and friction points this paper is trying to address.

RoboMIND provides multi-embodiment robot manipulation data.
Includes 107k trajectories for 479 diverse tasks.
Facilitates imitation and Vision-Language-Action model training.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-embodiment teleoperation dataset
Unified data collection platform
Digital twin environment simulation
K
Kun Wu
Beijing Innovation Center of Humanoid Robotics
Chengkai Hou
Chengkai Hou
Peking University
Robot
J
Jiaming Liu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence
Zhengping Che
Zhengping Che
X-Humanoid
Embodied AIDeep Learning
X
Xiaozhu Ju
Beijing Innovation Center of Humanoid Robotics
Z
Zhuqin Yang
Beijing Innovation Center of Humanoid Robotics
M
Meng Li
Beijing Innovation Center of Humanoid Robotics
Yinuo Zhao
Yinuo Zhao
Phd, Beijing Institute of Technology
Deep reinforcement learningmobile crowdsensingrobot learning
Z
Zhiyuan Xu
Beijing Innovation Center of Humanoid Robotics
G
Guang Yang
Beijing Innovation Center of Humanoid Robotics
Z
Zhen Zhao
Beijing Innovation Center of Humanoid Robotics
Guangyu Li
Guangyu Li
New York University
Recommendation SystemSocial NetworksNetwork Caching System
Z
Zhao Jin
Beijing Innovation Center of Humanoid Robotics
L
Lecheng Wang
Beijing Innovation Center of Humanoid Robotics
J
Jilei Mao
Beijing Innovation Center of Humanoid Robotics
X
Xinhua Wang
Beijing Innovation Center of Humanoid Robotics
S
Shichao Fan
Beijing Innovation Center of Humanoid Robotics
N
Ning Liu
Beijing Innovation Center of Humanoid Robotics
P
Peifeng Ren
Beijing Innovation Center of Humanoid Robotics
Q
Qiang Zhang
Beijing Innovation Center of Humanoid Robotics
Y
Yaoxu Lyu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
M
Mengzhen Liu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence
J
Jingyang He
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence
Yulin Luo
Yulin Luo
Peking University
Data-centric AILLMVLMEmbodied AI
Z
Zeyu Gao
Beijing Academy of Artificial Intelligence
C
Chenxuan Li
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Chenyang Gu
Chenyang Gu
Undergraduate, Peking University
Embodied AIRobotic Manipulation
Y
Yankai Fu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
D
Di Wu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Xingyu Wang
Xingyu Wang
Nanjing University of Posts and Telecommunications
NLP
S
Sixiang Chen
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence
Z
Zhenyu Wang
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Pengju An
Pengju An
Peking University
AIGC、LLM
S
Siyuan Qian
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence
Shanghang Zhang
Shanghang Zhang
Peking University
Embodied AIFoundation Models
J
Jian Tang
Beijing Innovation Center of Humanoid Robotics