🤖 AI Summary
This work addresses cross-domain imitation learning from human videos to robotic manipulation, circumventing the need for costly and scarce real-robot demonstration data. To bridge three key domain gaps—visual representation mismatch, morphological disparity between human hands and robotic arms, and physical dynamics inconsistency—we propose a multi-domain co-training framework leveraging Dynamic Time Warping (DTW) and MixUp-based trajectory interpolation. First, human hand poses are kinematically retargeted into robot-executable trajectories; then, DTW aligns temporal sequences across domains to construct an intermediate domain, enabling joint optimization over a small set of teleoperated demonstrations and large-scale unlabeled human videos. The method requires no extensive labeled robot data. Evaluated on four real robotic platforms across four manipulation tasks, it achieves significant improvements in task success rate (+28.6% on average) and motion smoothness, demonstrating strong generalization and effective cross-domain transfer.
📝 Abstract
Learning robot manipulation from abundant human videos offers a scalable alternative to costly robot-specific data collection. However, domain gaps across visual, morphological, and physical aspects hinder direct imitation. To effectively bridge the domain gap, we propose ImMimic, an embodiment-agnostic co-training framework that leverages both human videos and a small amount of teleoperated robot demonstrations. ImMimic uses Dynamic Time Warping (DTW) with either action- or visual-based mapping to map retargeted human hand poses to robot joints, followed by MixUp interpolation between paired human and robot trajectories. Our key insights are (1) retargeted human hand trajectories provide informative action labels, and (2) interpolation over the mapped data creates intermediate domains that facilitate smooth domain adaptation during co-training. Evaluations on four real-world manipulation tasks (Pick and Place, Push, Hammer, Flip) across four robotic embodiments (Robotiq, Fin Ray, Allegro, Ability) show that ImMimic improves task success rates and execution smoothness, highlighting its efficacy to bridge the domain gap for robust robot manipulation. The project website can be found at https://sites.google.com/view/immimic.