Hierarchical Reinforcement Learning and Value Optimization for Challenging Quadruped Locomotion

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the poor generalization and robustness of quadrupedal robots on unseen complex terrains, this paper proposes a hierarchical reinforcement learning framework that eliminates the need for additional offline training of the high-level policy. The low-level controller employs an on-policy actor-critic algorithm, taking foot placement targets as input and jointly optimizing the value function. Crucially, the high-level policy is not pretrained offline; instead, it performs online optimization of gait target selection based on the low-level value function. This design enables tight coupling between high-level decision-making and low-level control, preserving training efficiency while significantly enhancing cross-terrain adaptability. Experiments demonstrate that, compared to end-to-end methods, the proposed framework achieves higher cumulative rewards, lower collision frequencies, and superior generalization, robustness, and real-time decision-making efficiency across diverse unseen complex terrains.

Technology Category

Application Category

📝 Abstract

We propose a novel hierarchical reinforcement learning framework for quadruped locomotion over challenging terrain. Our approach incorporates a two-layer hierarchy in which a high-level policy (HLP) selects optimal goals for a low-level policy (LLP). The LLP is trained using an on-policy actor-critic RL algorithm and is given footstep placements as goals. We propose an HLP that does not require any additional training or environment samples and instead operates via an online optimization process over the learned value function of the LLP. We demonstrate the benefits of this framework by comparing it with an end-to-end reinforcement learning (RL) approach. We observe improvements in its ability to achieve higher rewards with fewer collisions across an array of different terrains, including terrains more difficult than any encountered during training.

Problem

Research questions and friction points this paper is trying to address.

Develop hierarchical RL for quadruped locomotion on challenging terrain

Optimize high-level policy via learned value function without retraining

Improve reward and reduce collisions across diverse unseen terrains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical reinforcement learning for quadruped locomotion

High-level policy optimizes low-level policy goals

Online value function optimization without retraining

🔎 Similar Papers

Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey