Understanding and Improving Hyperbolic Deep Reinforcement Learning

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Hyperbolic representation learning in reinforcement learning effectively captures environmental hierarchical structure, yet suffers from gradient instability under non-stationary training due to unbounded embedding norms—undermining PPO’s trust-region optimization. This work establishes, for the first time, a theoretical connection between embedding norm magnitude in hyperbolic space and policy optimization stability. We propose Hyper++, a novel framework comprising (i) a classification-based value loss, (ii) norm-bounded regularization, and (iii) an optimization-friendly hyperbolic layer design. Evaluated on ProcGen, Hyper++ improves training stability and accelerates convergence by ~30%. On Atari-5, it consistently outperforms both Euclidean and state-of-the-art hyperbolic baselines. All code is publicly released.

Technology Category

Application Category

📝 Abstract

The performance of reinforcement learning (RL) agents depends critically on the quality of the underlying feature representations. Hyperbolic feature spaces are well-suited for this purpose, as they naturally capture hierarchical and relational structure often present in complex RL environments. However, leveraging these spaces commonly faces optimization challenges due to the nonstationarity of RL. In this work, we identify key factors that determine the success and failure of training hyperbolic deep RL agents. By analyzing the gradients of core operations in the Poincaré Ball and Hyperboloid models of hyperbolic geometry, we show that large-norm embeddings destabilize gradient-based training, leading to trust-region violations in proximal policy optimization (PPO). Based on these insights, we introduce Hyper++, a new hyperbolic PPO agent that consists of three components: (i) stable critic training through a categorical value loss instead of regression; (ii) feature regularization guaranteeing bounded norms while avoiding the curse of dimensionality from clipping; and (iii) using a more optimization-friendly formulation of hyperbolic network layers. In experiments on ProcGen, we show that Hyper++ guarantees stable learning, outperforms prior hyperbolic agents, and reduces wall-clock time by approximately 30%. On Atari-5 with Double DQN, Hyper++ strongly outperforms Euclidean and hyperbolic baselines. We release our code at https://github.com/Probabilistic-and-Interactive-ML/hyper-rl .

Problem

Research questions and friction points this paper is trying to address.

Addresses optimization challenges in hyperbolic deep reinforcement learning.

Identifies factors causing training instability in hyperbolic RL agents.

Proposes methods to stabilize training and improve performance in RL.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stable critic training with categorical value loss

Feature regularization ensuring bounded embedding norms

Optimization-friendly hyperbolic network layer formulation

🔎 Similar Papers

Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning