Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains

📅 2025-12-06

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

To address the premature convergence issue of Proximal Policy Optimization (PPO)-based deep reinforcement learning for quadrupedal locomotion on complex terrains—leading to suboptimal policies—this work proposes a novel reinforcement learning method integrating adaptive entropy regularization with intrinsic motivation. Built upon the PPO framework, the approach introduces a state-novelty-driven intrinsic reward and a differentiable entropy constraint to dynamically balance exploration and exploitation. Training is efficiently parallelized using Isaac Gym. Experiments across six representative challenging terrains demonstrate significant improvements: task reward increases by 4–12%, body pitch oscillation decreases by 23–29%, joint acceleration reduces by 20–32%, and actuation torque consumption drops by 11–20%. These results collectively indicate enhanced locomotion robustness and energy efficiency.

Technology Category

Application Category

📝 Abstract

Learning is the basis of both biological and artificial systems when it comes to mimicking intelligent behaviors. From the classical PPO (Proximal Policy Optimization), there is a series of deep reinforcement learning algorithms which are widely used in training locomotion policies for quadrupedal robots because of their stability and sample efficiency. However, among all these variants, experiments and simulations often converge prematurely, leading to suboptimal locomotion and reduced task performance. Therefore, in this paper, we introduce Entropy-Controlled Intrinsic Motivation (ECIM), an entropy-based reinforcement learning algorithm in contrast with the PPO series, that can reduce premature convergence by combining intrinsic motivation with adaptive exploration. For experiments, in order to parallel with other baselines, we chose to apply it in Isaac Gym across six terrain categories: upward slopes, downward slopes, uneven rough terrain, ascending stairs, descending stairs, and flat ground as widely used. For comparison, our experiments consistently achieve better performance: task rewards increase by 4--12%, peak body pitch oscillation is reduced by 23--29%, joint acceleration decreases by 20--32%, and joint torque consumption declines by 11--20%. Overall, our model ECIM, by combining entropy control and intrinsic motivation control, achieves better results in stability across different terrains for quadrupedal locomotion, and at the same time reduces energetic cost and makes it a practical choice for complex robotic control tasks.

Problem

Research questions and friction points this paper is trying to address.

Reduces premature convergence in quadruped robot locomotion learning.

Improves stability and performance across diverse complex terrains.

Lowers energetic costs like joint torque and acceleration.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-controlled intrinsic motivation reinforcement learning algorithm

Reduces premature convergence via adaptive exploration

Improves stability and reduces energy consumption on terrains

🔎 Similar Papers

Walking with Terrain Reconstruction: Learning to Traverse Risky Sparse Footholds