π€ AI Summary
This work addresses the challenge of maintaining both operational precision and dynamic stability in humanoid robots during long-horizon mobile manipulation, where leg motion induces cumulative drift of the end-effector in the world frame. To this end, the authors propose HiWET, a hierarchical reinforcement learning framework: a high-level policy jointly optimizes end-effector pose and base posture in the world coordinate system to generate subgoals, while a low-level policy tracks these subgoals under stability constraints. The approach innovatively formulates mobile manipulation as a world-frame end-effector tracking problem, incorporates kinematic manifold priors (KMP) to reduce action space dimensionality, and integrates residual learning to enable synergistic coordination between high-level reasoning and low-level execution. In simulation, the method achieves high-precision long-horizon tracking, and the low-level policy demonstrates successful zero-shot transfer to a physical robot, enabling stable mobile manipulation across diverse task instructions.
π Abstract
Humanoid loco-manipulation requires executing precise manipulation tasks while maintaining dynamic stability amid base motion and impacts. Existing approaches typically formulate commands in body-centric frames, fail to inherently correct cumulative world-frame drift induced by legged locomotion. We reformulate the problem as world-frame end-effector tracking and propose HiWET, a hierarchical reinforcement learning framework that decouples global reasoning from dynamic execution. The high-level policy generates subgoals that jointly optimize end-effector accuracy and base positioning in the world frame, while the low-level policy executes these commands under stability constraints. We introduce a Kinematic Manifold Prior (KMP) that embeds the manipulation manifold into the action space via residual learning, reducing exploration dimensionality and mitigating kinematically invalid behaviors. Extensive simulation and ablation studies demonstrate that HiWET achieves precise and stable end-effector tracking in long-horizon world-frame tasks. We validate zero-shot sim-to-real transfer of the low-level policy on a physical humanoid, demonstrating stable locomotion under diverse manipulation commands. These results indicate that explicit world-frame reasoning combined with hierarchical control provides an effective and scalable solution for long-horizon humanoid loco-manipulation.