🤖 AI Summary
To address the limited generalizability and robustness of high-level policies in real-time bipedal gait generation over complex terrain, this paper proposes a terrain-aware hierarchical reinforcement learning framework. Methodologically: (1) A CNN-VAE is designed to extract low-dimensional, disentangled terrain latent representations, with the first systematic analysis of latent dimensionality’s impact on policy performance; (2) Historical latent sequences are fused with a reduced-order dynamical model to construct a lightweight, informative state representation; (3) Knowledge distillation from depth images to the latent space enables alignment between simulation and real-world sensor data. Evaluated in Agility Robotics’ high-fidelity simulation under realistic conditions—including sensor noise, state estimation errors, and actuator dynamics—the framework significantly improves policy generalization and real-time execution capability. Preliminary hardware validation further confirms its feasibility for real-world deployment.
📝 Abstract
This work introduces a hierarchical strategy for terrain-aware bipedal locomotion that integrates reduced-dimensional perceptual representations to enhance reinforcement learning (RL)-based high-level (HL) policies for real-time gait generation. Unlike end-to-end approaches, our framework leverages latent terrain encodings via a Convolutional Variational Autoencoder (CNN-VAE) alongside reduced-order robot dynamics, optimizing the locomotion decision process with a compact state. We systematically analyze the impact of latent space dimensionality on learning efficiency and policy robustness. Additionally, we extend our method to be history-aware, incorporating sequences of recent terrain observations into the latent representation to improve robustness. To address real-world feasibility, we introduce a distillation method to learn the latent representation directly from depth camera images and provide preliminary hardware validation by comparing simulated and real sensor data. We further validate our framework using the high-fidelity Agility Robotics (AR) simulator, incorporating realistic sensor noise, state estimation, and actuator dynamics. The results confirm the robustness and adaptability of our method, underscoring its potential for hardware deployment.