π€ AI Summary
Humanoid robots face significant challenges in loco-manipulation tasks due to the scarcity of high-quality demonstration data in high-dimensional action spaces. This work proposes an automatic data synthesis method that leverages a small set of source demonstrations and contact-aware whole-body motion planning to transfer contact-rich skills to novel states. By jointly optimizing locomotion and manipulation, the approach generates diverse, stable, and collision-free whole-body behaviors at scaleβmarking the first large-scale automated generation of loco-manipulation data for humanoid robots. The synthesized dataset enables cross-object-pose generalization and supports a new simulation benchmark comprising nine distinct tasks. Policies trained with this data achieve a 20% performance gain over those trained solely on real demonstrations, establishing a systematic foundation for studying data generation and visuomotor policy learning.
π Abstract
Imitation learning is a promising approach for training humanoid robots to both walk and manipulate, but it requires a large number of demonstrations, which are time-intensive and difficult to collect via teleoperation. Existing data-generation algorithms can automatically synthesize demonstrations for manipulators, but they are ineffective on humanoids because their high-dimensional composite action spaces involve arms, legs, and torsos. We present HumanoidMimicGen, a method for generating humanoid legged loco-manipulation data. Our method adapts contact-rich whole-body skills from a handful of source demonstrations to new states, generalizing across changes in object pose. By interleaving these single- and dual-arm skills with whole-body locomotion and manipulation planning, the method generates stable, collision-free data across diverse scenes and layouts. To evaluate our approach, we introduce a new simulated loco-manipulation benchmark containing nine diverse tasks that test humanoid loco-manipulation capabilities. There, we demonstrate that HumanoidMimicGen automatically generates large datasets for imitation learning and enables a systematic study of how data generation and policy learning decisions impact model performance. We show that whole-body visuomotor policies co-trained with data generated by HumanoidMimicGen outperform those trained only on real-world data by 20%.