RoboWheel: A Data Engine from Real-World Human Demonstrations for Cross-Embodiment Robotic Learning

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high cost and poor generalizability of teleoperation-based data collection for robot learning. We propose a novel paradigm for automatically generating cross-morphology robot training data from monocular RGB(D) human hand–object interaction (HOI) videos. Our method integrates high-fidelity HOI reconstruction, physics-grounded reinforcement learning optimization, morphology-agnostic action representation, and cross-morphology trajectory retargeting, augmented by Isaac Sim simulation and domain randomization to form an end-to-end data generation pipeline. We provide the first empirical validation that hand–object interaction videos serve as high-quality supervision signals for robot learning. Crucially, our lightweight, universal action representation eliminates dependency on specific robot kinematics. Experiments demonstrate that the generated data achieves performance on par with teleoperated data across mainstream vision-language-action (VLA) and imitation learning models, while significantly improving cross-task and cross-morphology generalization. To foster community advancement, we open-source a large-scale, multimodal dataset.

Technology Category

Application Category

📝 Abstract
We introduce Robowheel, a data engine that converts human hand object interaction (HOI) videos into training-ready supervision for cross morphology robotic learning. From monocular RGB or RGB-D inputs, we perform high precision HOI reconstruction and enforce physical plausibility via a reinforcement learning (RL) optimizer that refines hand object relative poses under contact and penetration constraints. The reconstructed, contact rich trajectories are then retargeted to cross-embodiments, robot arms with simple end effectors, dexterous hands, and humanoids, yielding executable actions and rollouts. To scale coverage, we build a simulation-augmented framework on Isaac Sim with diverse domain randomization (embodiments, trajectories, object retrieval, background textures, hand motion mirroring), which enriches the distributions of trajectories and observations while preserving spatial relationships and physical plausibility. The entire data pipeline forms an end to end pipeline from video,reconstruction,retargeting,augmentation data acquisition. We validate the data on mainstream vision language action (VLA) and imitation learning architectures, demonstrating that trajectories produced by our pipeline are as stable as those from teleoperation and yield comparable continual performance gains. To our knowledge, this provides the first quantitative evidence that HOI modalities can serve as effective supervision for robotic learning. Compared with teleoperation, Robowheel is lightweight, a single monocular RGB(D) camera is sufficient to extract a universal, embodiment agnostic motion representation that could be flexibly retargeted across embodiments. We further assemble a large scale multimodal dataset combining multi-camera captures, monocular videos, and public HOI corpora for training and evaluating embodied models.
Problem

Research questions and friction points this paper is trying to address.

Converts human hand-object interaction videos into robotic training data
Retargets human motions to diverse robot embodiments for execution
Provides scalable simulation-augmented data generation for cross-embodiment learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts human videos to robot training data via reconstruction
Uses reinforcement learning to refine poses under physical constraints
Scales data with simulation-augmented domain randomization framework
Y
Yuhong Zhang
Tsinghua University
Z
Zihan Gao
Tsinghua University
S
Shengpeng Li
Synapath
Ling-Hao Chen
Ling-Hao Chen
Ph.D. Student, Tsinghua University, IDEA Research
Computer GraphicsComputer VisionCharacter Animation
K
Kaisheng Liu
Synapath
R
Runqing Cheng
Synapath
X
Xiao Lin
Synapath
J
Junjia Liu
Synapath
Z
Zhuoheng Li
HKU
J
Jingyi Feng
PolyU
Z
Ziyan He
Synapath
J
Jintian Lin
Synapath
Z
Zheyan Huang
Tsinghua University
Zhifang Liu
Zhifang Liu
School of Mathematical Sciences, Tianjin Normal University
image processing
H
Haoqian Wang
Tsinghua University