MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of directly learning novel motor patterns for robotic manipulation from human demonstrations. To this end, we propose MotionTrans: a framework integrating VR-based motion capture with a human-to-robot motion translation pipeline, coupled with a multi-task weighted co-training mechanism that enables direct end-to-end transfer from human motion-level behavior to robot policies. MotionTrans is the first method to support zero-shot cross-modal transfer of unseen actions without hand-crafted motion priors. After joint training on 30 tasks, it successfully transfers to 13 novel tasks, achieving non-trivial zero-shot success on 9. Under the pretrain-fine-tune paradigm, average task success rates improve by 40%. Our core contribution is establishing an end-to-end learning pathway from human VR motion data to embodied robot policies—unleashing the generalization potential of human motion data for robotic skill acquisition.

Technology Category

Application Category

📝 Abstract
Scaling real robot data is a key bottleneck in imitation learning, leading to the use of auxiliary data for policy training. While other aspects of robotic manipulation such as image or language understanding may be learned from internet-based datasets, acquiring motion knowledge remains challenging. Human data, with its rich diversity of manipulation behaviors, offers a valuable resource for this purpose. While previous works show that using human data can bring benefits, such as improving robustness and training efficiency, it remains unclear whether it can realize its greatest advantage: enabling robot policies to directly learn new motions for task completion. In this paper, we systematically explore this potential through multi-task human-robot cotraining. We introduce MotionTrans, a framework that includes a data collection system, a human data transformation pipeline, and a weighted cotraining strategy. By cotraining 30 human-robot tasks simultaneously, we direcly transfer motions of 13 tasks from human data to deployable end-to-end robot policies. Notably, 9 tasks achieve non-trivial success rates in zero-shot manner. MotionTrans also significantly enhances pretraining-finetuning performance (+40% success rate). Through ablation study, we also identify key factors for successful motion learning: cotraining with robot data and broad task-related motion coverage. These findings unlock the potential of motion-level learning from human data, offering insights into its effective use for training robotic manipulation policies. All data, code, and model weights are open-sourced https://motiontrans.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Scaling real robot data is a bottleneck in imitation learning
Acquiring motion knowledge for robotic manipulation remains challenging
Enabling robot policies to directly learn new motions from human data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human VR data transformation pipeline for robotic motion transfer
Multi-task weighted cotraining strategy with robot data
Direct motion-level learning from human to robot policies
🔎 Similar Papers
No similar papers found.
Chengbo Yuan
Chengbo Yuan
Institute for Interdisciplinary Information Science (IIIS), Tsinghua University
Embodied AIComputer VisionRobot LearningAgent
R
Rui Zhou
Wuhan University
M
Mengzhen Liu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Yingdong Hu
Yingdong Hu
Institute for Interdisciplinary Information Sciences, Tsinghua University
computer visionrobotics
Shengjie Wang
Shengjie Wang
Tsinghua University
RoboticsReinforcement learningBionic robotics
L
Li Yi
Institute for Interdisciplinary Information Sciences, Tsinghua University
Chuan Wen
Chuan Wen
Shanghai Jiao Tong University
RoboticsMachine LearningComputer Vision
Shanghang Zhang
Shanghang Zhang
Peking University
Embodied AIFoundation Models
Y
Yang Gao
Institute for Interdisciplinary Information Sciences, Tsinghua University