Hierarchical Reinforcement Learning and Value Optimization for Challenging Quadruped Locomotion

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor generalization and robustness of quadrupedal robots on unseen complex terrains, this paper proposes a hierarchical reinforcement learning framework that eliminates the need for additional offline training of the high-level policy. The low-level controller employs an on-policy actor-critic algorithm, taking foot placement targets as input and jointly optimizing the value function. Crucially, the high-level policy is not pretrained offline; instead, it performs online optimization of gait target selection based on the low-level value function. This design enables tight coupling between high-level decision-making and low-level control, preserving training efficiency while significantly enhancing cross-terrain adaptability. Experiments demonstrate that, compared to end-to-end methods, the proposed framework achieves higher cumulative rewards, lower collision frequencies, and superior generalization, robustness, and real-time decision-making efficiency across diverse unseen complex terrains.

Technology Category

Application Category

📝 Abstract
We propose a novel hierarchical reinforcement learning framework for quadruped locomotion over challenging terrain. Our approach incorporates a two-layer hierarchy in which a high-level policy (HLP) selects optimal goals for a low-level policy (LLP). The LLP is trained using an on-policy actor-critic RL algorithm and is given footstep placements as goals. We propose an HLP that does not require any additional training or environment samples and instead operates via an online optimization process over the learned value function of the LLP. We demonstrate the benefits of this framework by comparing it with an end-to-end reinforcement learning (RL) approach. We observe improvements in its ability to achieve higher rewards with fewer collisions across an array of different terrains, including terrains more difficult than any encountered during training.
Problem

Research questions and friction points this paper is trying to address.

Develop hierarchical RL for quadruped locomotion on challenging terrain
Optimize high-level policy via learned value function without retraining
Improve reward and reduce collisions across diverse unseen terrains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical reinforcement learning for quadruped locomotion
High-level policy optimizes low-level policy goals
Online value function optimization without retraining
🔎 Similar Papers
J
Jeremiah Coholich
Institute of Robotics and Intelligent Machine, Georgia Institute of Technology, Atlanta, GA, USA
Muhammad Ali Murtaza
Muhammad Ali Murtaza
Georgia Institute of Technology
Robotics
S
Seth Hutchinson
Institute of Robotics and Intelligent Machine, Georgia Institute of Technology, Atlanta, GA, USA
Zsolt Kira
Zsolt Kira
Associate Professor, Georgia Institute of Technology
Machine LearningPerceptionRoboticsArtificial Intelligence