🤖 AI Summary
Traditional velocity-tracking controllers for short-range SE(2) pose navigation in humanoid robots induce inefficient, “marching-like” locomotion. Method: This paper proposes an end-to-end reinforcement learning framework that directly optimizes pose reaching—bypassing intermediate velocity trajectory tracking. We introduce a sparse reward function based on a constellation-inspired geometric structure to encourage natural, energy-efficient target-oriented motion; design a multi-objective evaluation benchmark integrating energy consumption, task completion time, and step count; and employ SE(2)-encoded goal representations with curriculum learning to enhance sim-to-real policy transfer. Contribution/Results: Experiments demonstrate significant improvements over baseline methods across all metrics, including reduced energy use, shorter execution time, and fewer steps. The learned policy is successfully deployed on a real humanoid robot platform, validating its practical efficacy and generalizability.
📝 Abstract
Humanoids operating in real-world workspaces must frequently execute task-driven, short-range movements to SE(2) target poses. To be practical, these transitions must be fast, robust, and energy efficient. While learning-based locomotion has made significant progress, most existing methods optimize for velocity-tracking rather than direct pose reaching, resulting in inefficient, marching-style behavior when applied to short-range tasks. In this work, we develop a reinforcement learning approach that directly optimizes humanoid locomotion for SE(2) targets. Central to this approach is a new constellation-based reward function that encourages natural and efficient target-oriented movement. To evaluate performance, we introduce a benchmarking framework that measures energy consumption, time-to-target, and footstep count on a distribution of SE(2) goals. Our results show that the proposed approach consistently outperforms standard methods and enables successful transfer from simulation to hardware, highlighting the importance of targeted reward design for practical short-range humanoid locomotion.