π€ AI Summary
This work addresses the challenge of long-range navigation failure in humanoid robots caused by incompatibility between SE(2) reference trajectories and complex terrain geometry. To resolve this, the authors propose a terrain-aware reinforcement learning framework that dynamically modulates SE(2) reference trajectories during training to ensure terrain compatibility. The approach integrates foothold projection, center-of-mass trajectory adaptation, and swing-leg trajectory adjustment based on terrain geometry, while leveraging model predictive control (MPC) and control barrier functions for coordinated planning and control. The resulting gait references seamlessly interface with standard navigation stacks through an SE(2) velocity command. Simulations demonstrate significantly improved trajectory tracking performance, and real-world experiments on the Unitree G1 platform achieve fully onboard, closed-loop autonomous navigation over more than 70 meters of outdoor complex terrain, including consecutive staircases.
π Abstract
We present a method for training reference-guided, perceptive reinforcement learning locomotion policies for humanoid robots in which reference trajectories are modulated in training to be consistent with terrain geometry. Aiming to deploy our method with standard navigation autonomy infrastructure, we synthesize SE(2)-controllable reference trajectories inside the RL training loop, projecting desired footsteps onto valid footholds and adjusting swing-foot and center-of-mass trajectories to match the terrain. The resulting policy exposes a clean SE(2) velocity interface compatible with standard navigation planners. In simulation, environmentally-conditioned references significantly improve reference tracking performance compared to environment agnostic references. On hardware, we integrate the policy with an MPC + control barrier function planner and demonstrate long-horizon (>70m) closed-loop autonomous navigation on the Unitree G1 through outdoor environments containing rough terrain and consecutive flights of stairs, with all sensing and computation onboard.