RPL: Learning Robust Humanoid Perceptive Locomotion on Challenging Terrains

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses the challenge of achieving robust omnidirectional locomotion for humanoid robots on complex terrains, particularly under payload conditions. The authors propose RPL, a two-stage training framework: first, terrain-specific expert policies are trained using privileged height maps; then, these policies are distilled into a vision-based Transformer policy that relies solely on multi-view depth cameras. Key innovations include velocity-command-driven depth feature scaling and a randomized lateral masking mechanism to enhance adaptability to asymmetric observations and unknown terrain widths. An efficient multi-depth simulation system is developed, integrating ray-casting acceleration, parallel rendering of static and dynamic meshes, and realistic modeling of sensor noise and latency. Real-world experiments demonstrate that the robot robustly traverses challenging terrains—including 20° slopes, variable-step stairs, and 25 cm stepping stones spanning 60 cm gaps—while carrying a 2 kg payload.

Technology Category

Application Category

📝 Abstract

Humanoid perceptive locomotion has made significant progress and shows great promise, yet achieving robust multi-directional locomotion on complex terrains remains underexplored. To tackle this challenge, we propose RPL, a two-stage training framework that enables multi-directional locomotion on challenging terrains, and remains robust with payloads. RPL first trains terrain-specific expert policies with privileged height map observations to master decoupled locomotion and manipulation skills across different terrains, and then distills them into a transformer policy that leverages multiple depth cameras to cover a wide range of views. During distillation, we introduce two techniques to robustify multi-directional locomotion, depth feature scaling based on velocity commands and random side masking, which are critical for asymmetric depth observations and unseen widths of terrains. For scalable depth distillation, we develop an efficient multi-depth system that ray-casts against both dynamic robot meshes and static terrain meshes in massively parallel environments, achieving a 5-times speedup over the depth rendering pipelines in existing simulators while modeling realistic sensor latency, noise, and dropout. Extensive real-world experiments demonstrate robust multi-directional locomotion with payloads (2kg) across challenging terrains, including 20{\deg} slopes, staircases with different step lengths (22 cm, 25 cm, 30 cm), and 25 cm by 25 cm stepping stones separated by 60 cm gaps.

Problem

Research questions and friction points this paper is trying to address.

humanoid locomotion

challenging terrains

multi-directional locomotion

robust perception

payload handling

Innovation

Methods, ideas, or system contributions that make the work stand out.

perceptive locomotion

two-stage training

depth distillation