🤖 AI Summary
To address the poor generalization and robustness of quadrupedal robots on unseen complex terrains, this paper proposes a hierarchical reinforcement learning framework that eliminates the need for additional offline training of the high-level policy. The low-level controller employs an on-policy actor-critic algorithm, taking foot placement targets as input and jointly optimizing the value function. Crucially, the high-level policy is not pretrained offline; instead, it performs online optimization of gait target selection based on the low-level value function. This design enables tight coupling between high-level decision-making and low-level control, preserving training efficiency while significantly enhancing cross-terrain adaptability. Experiments demonstrate that, compared to end-to-end methods, the proposed framework achieves higher cumulative rewards, lower collision frequencies, and superior generalization, robustness, and real-time decision-making efficiency across diverse unseen complex terrains.
📝 Abstract
We propose a novel hierarchical reinforcement learning framework for quadruped locomotion over challenging terrain. Our approach incorporates a two-layer hierarchy in which a high-level policy (HLP) selects optimal goals for a low-level policy (LLP). The LLP is trained using an on-policy actor-critic RL algorithm and is given footstep placements as goals. We propose an HLP that does not require any additional training or environment samples and instead operates via an online optimization process over the learned value function of the LLP. We demonstrate the benefits of this framework by comparing it with an end-to-end reinforcement learning (RL) approach. We observe improvements in its ability to achieve higher rewards with fewer collisions across an array of different terrains, including terrains more difficult than any encountered during training.