🤖 AI Summary
Quadrupedal robots exhibit limited dexterity and poor generalization in manipulation tasks. Method: We propose a cross-morphology imitation learning framework featuring (i) the first human-to-robot cross-embodiment pretraining paradigm, (ii) the first structured multimodal manipulation dataset for the LocoMan platform—supporting both unimanual and bimanual operation—and (iii) a modular co-training architecture enabling modality alignment between observation and action spaces. Our approach integrates teleoperated data collection, standardized representation learning, and efficient fine-tuning strategies. Results: On six real-world manipulation tasks, our method achieves an average success rate improvement of 41.9%; under out-of-distribution (OOD) conditions, the gain rises to 79.7%. Remarkably, using only 50% of robot-collected data plus human pretraining surpasses the full-data baseline by 38.6%, demonstrating strong sample efficiency and cross-embodiment transfer capability.
📝 Abstract
Quadrupedal robots have demonstrated impressive locomotion capabilities in complex environments, but equipping them with autonomous versatile manipulation skills in a scalable way remains a significant challenge. In this work, we introduce a cross-embodiment imitation learning system for quadrupedal manipulation, leveraging data collected from both humans and LocoMan, a quadruped equipped with multiple manipulation modes. Specifically, we develop a teleoperation and data collection pipeline, which unifies and modularizes the observation and action spaces of the human and the robot. To effectively leverage the collected data, we propose an efficient modularized architecture that supports co-training and pretraining on structured modality-aligned data across different embodiments. Additionally, we construct the first manipulation dataset for the LocoMan robot, covering various household tasks in both unimanual and bimanual modes, supplemented by a corresponding human dataset. We validate our system on six real-world manipulation tasks, where it achieves an average success rate improvement of 41.9% overall and 79.7% under out-of-distribution (OOD) settings compared to the baseline. Pretraining with human data contributes a 38.6% success rate improvement overall and 82.7% under OOD settings, enabling consistently better performance with only half the amount of robot data. Our code, hardware, and data are open-sourced at: https://human2bots.github.io.