Attention-Based Map Encoding for Learning Generalized Legged Locomotion

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing model-based approaches suffer from poor robustness in dynamic locomotion control of legged robots over complex terrain, while learning-based methods exhibit insufficient precision on sparse, step-wise traversable terrain. To address this, we propose an attention-augmented map encoding method that fuses local depth terrain maps with robot state via multimodal integration and embeds the fusion end-to-end into a PPO-based reinforcement learning controller. Crucially, we introduce an interpretable attention mechanism that dynamically focuses on traversable regions—marking the first such application in legged locomotion—to significantly improve motion accuracy and cross-terrain generalization on sparse-step terrains. The method is validated across morphologically distinct platforms: a 12-DoF quadruped and a 23-DoF humanoid. It achieves robust, disturbance-resilient, agile, and precise dynamic locomotion in unseen indoor and outdoor real-world environments, demonstrating both interpretable terrain perception and strong controller generalization.

Technology Category

Application Category

📝 Abstract
Dynamic locomotion of legged robots is a critical yet challenging topic in expanding the operational range of mobile robots. It requires precise planning when possible footholds are sparse, robustness against uncertainties and disturbances, and generalizability across diverse terrains. While traditional model-based controllers excel at planning on complex terrains, they struggle with real-world uncertainties. Learning-based controllers offer robustness to such uncertainties but often lack precision on terrains with sparse steppable areas. Hybrid methods achieve enhanced robustness on sparse terrains by combining both methods but are computationally demanding and constrained by the inherent limitations of model-based planners. To achieve generalized legged locomotion on diverse terrains while preserving the robustness of learning-based controllers, this paper proposes to learn an attention-based map encoding conditioned on robot proprioception, which is trained as part of the end-to-end controller using reinforcement learning. We show that the network learns to focus on steppable areas for future footholds when the robot dynamically navigates diverse and challenging terrains. We synthesize behaviors that exhibit robustness against uncertainties while enabling precise and agile traversal of sparse terrains. Additionally, our method offers a way to interpret the topographical perception of a neural network. We have trained two controllers for a 12-DoF quadrupedal robot and a 23-DoF humanoid robot respectively and tested the resulting controllers in the real world under various challenging indoor and outdoor scenarios, including ones unseen during training.
Problem

Research questions and friction points this paper is trying to address.

Achieving robust legged locomotion on diverse terrains
Balancing precision and robustness in sparse foothold planning
Interpreting neural network perception for dynamic navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-based map encoding for terrain focus
End-to-end reinforcement learning controller training
Proprioception-conditioned neural network for locomotion
🔎 Similar Papers
Junzhe He
Junzhe He
ETH Zurich
Reinforcement LearningRobot Learning
C
Chong Zhang
Robotic Systems Lab, ETH Zurich, 8092 Zurich, Switzerland.
F
Fabian Jenelten
Robotic Systems Lab, ETH Zurich, 8092 Zurich, Switzerland.
Ruben Grandia
Ruben Grandia
Disney Research
RoboticsControlMachine Learning
M
Moritz BAcher
Disney Research Zurich, Stampfenbachstrasse 48, 8006 Zurich, Switzerland.
Marco Hutter
Marco Hutter
Professor of Robotics, ETH Zurich
Legged RoboticsRoboticsControl