FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of inefficient exploration and training instability in maximum-entropy reinforcement learning for high-dimensional humanoid control, which stem from the curse of dimensionality. To overcome these issues, the authors propose a Dimension-wise Entropy Modulation (DEM) mechanism that dynamically allocates exploration resources across individual action dimensions. This approach is further integrated with a continuous distributional critic to mitigate value overestimation in high-dimensional action spaces. The resulting method significantly enhances both the efficiency and stability of stochastic policy learning. Empirical evaluations on the HumanoidBench benchmark demonstrate that the proposed approach outperforms existing deterministic methods, achieving performance improvements of 180% and 400% on the Basketball and Balance Hard tasks, respectively.

Technology Category

Application Category

📝 Abstract
Scaling Maximum Entropy Reinforcement Learning (RL) to high-dimensional humanoid control remains a formidable challenge, as the ``curse of dimensionality'' induces severe exploration inefficiency and training instability in expansive action spaces. Consequently, recent high-throughput paradigms have largely converged on deterministic policy gradients combined with massive parallel simulation. We challenge this compromise with FastDSAC, a framework that effectively unlocks the potential of maximum entropy stochastic policies for complex continuous control. We introduce Dimension-wise Entropy Modulation (DEM) to dynamically redistribute the exploration budget and enforce diversity, alongside a continuous distributional critic tailored to ensure value fidelity and mitigate high-dimensional value overestimation. Extensive evaluations on HumanoidBench and other continuous control tasks demonstrate that rigorously designed stochastic policies can consistently match or outperform deterministic baselines, achieving notable gains of 180\% and 400\% on the challenging \textit{Basketball} and \textit{Balance Hard} tasks.
Problem

Research questions and friction points this paper is trying to address.

Maximum Entropy Reinforcement Learning
High-Dimensional Humanoid Control
Curse of Dimensionality
Exploration Inefficiency
Training Instability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximum Entropy Reinforcement Learning
Dimension-wise Entropy Modulation
Distributional Critic
High-Dimensional Humanoid Control
Stochastic Policy
🔎 Similar Papers
J
Jun Xue
the College of Information Science and Technology, Eastern Institute of Technology, Ningbo, China; Department of Control Science and Engineering, Tongji University, Shanghai, China
J
Junze Wang
the College of Information Science and Technology, Eastern Institute of Technology, Ningbo, China; College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
Xinming Zhang
Xinming Zhang
Professor,School of Computer Science and Technology,University of Science and Technology of China
Graph Neural NetworksTarget RecognitionWireless NetworksBig Data Security
Shanze Wang
Shanze Wang
The Hong Kong Polytechnic University
Mapless NavigationAutonomous SystemReinforcement Learning
Yanjun Chen
Yanjun Chen
University of Illinois Urbana-Champaign
Human Computer InteractionHaptics
Wei Zhang
Wei Zhang
College of Information Science and Technology, Eastern Institute of Technology, Ningbo, China.
reinforcement learningmotion planninghumanoid robotintelligent fault diagnosis