🤖 AI Summary
Humanoid robot whole-body control faces challenges arising from heterogeneous control modes and non-transferable policies across diverse tasks—such as navigation, mobile manipulation, and desktop manipulation. This paper proposes an end-to-end neural controller grounded in whole-body motion imitation as a unified representation, introducing motion imitation as a general abstraction for multimodal control and enabling seamless, single-policy switching across task modes. Methodologically, we design a multimodal network architecture based on policy distillation, jointly optimizing heterogeneous objectives—including root-link velocity tracking and upper-limb joint-angle tracking—while integrating kinematic imitation supervision with behavioral cloning in a unified training framework. Evaluations in simulation and on real humanoid robot platforms demonstrate significant improvements in cross-mode robustness and generalization, eliminating the need to retrain policies for new tasks and substantially enhancing deployment efficiency.
📝 Abstract
Humanoid whole-body control requires adapting to diverse tasks such as navigation, loco-manipulation, and tabletop manipulation, each demanding a different mode of control. For example, navigation relies on root velocity tracking, while tabletop manipulation prioritizes upper-body joint angle tracking. Existing approaches typically train individual policies tailored to a specific command space, limiting their transferability across modes. We present the key insight that full-body kinematic motion imitation can serve as a common abstraction for all these tasks and provide general-purpose motor skills for learning multiple modes of whole-body control. Building on this, we propose HOVER (Humanoid Versatile Controller), a multi-mode policy distillation framework that consolidates diverse control modes into a unified policy. HOVER enables seamless transitions between control modes while preserving the distinct advantages of each, offering a robust and scalable solution for humanoid control across a wide range of modes. By eliminating the need for policy retraining for each control mode, our approach improves efficiency and flexibility for future humanoid applications.