🤖 AI Summary
Humanoid robots face two core challenges in achieving generalizable loco-manipulation within human environments: poor task generalization and high Sim2Real transfer cost. To address these, we propose a hierarchical control framework: a low-level whole-body controller grounded in full-body dynamics modeling, which maps torso motion commands to joint torques; and a high-level closed-loop visuomotor policy driven by visual feedback, trained via simulation-generated data and visual imitation learning for task adaptation. Crucially, the entire framework requires only a single simulation training phase to achieve successful deployment on a real Unitree G1 robot. It accomplishes ten diverse dynamic locomotion-manipulation tasks, maintains robustness under spatial disturbances, and exhibits performance gains with increased simulation data volume. This approach significantly reduces reliance on hand-crafted task specifications and costly real-world data collection.
📝 Abstract
Loco-manipulation is a fundamental challenge for humanoid robots to achieve versatile interactions in human environments. Although recent studies have made significant progress in humanoid whole-body control, loco-manipulation remains underexplored and often relies on hard-coded task definitions or costly real-world data collection, which limits autonomy and generalization. We present DemoHLM, a framework for humanoid loco-manipulation that enables generalizable loco-manipulation on a real humanoid robot from a single demonstration in simulation. DemoHLM adopts a hierarchy that integrates a low-level universal whole-body controller with high-level manipulation policies for multiple tasks. The whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot. The manipulation policies, learned in simulation via our data generation and imitation learning pipeline, command the whole-body controller with closed-loop visual feedback to execute challenging loco-manipulation tasks. Experiments show a positive correlation between the amount of synthetic data and policy performance, underscoring the effectiveness of our data generation pipeline and the data efficiency of our approach. Real-world experiments on a Unitree G1 robot equipped with an RGB-D camera validate the sim-to-real transferability of DemoHLM, demonstrating robust performance under spatial variations across ten loco-manipulation tasks.