Hyperspherical Normalization for Scalable Deep Reinforcement Learning

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In deep reinforcement learning (DRL), scaling model and computational capacity often exacerbates training instability and overfitting due to non-stationary data distributions. To address this, we propose SimbaV2—a novel architecture that introduces hyperspherical normalization to DRL for the first time, jointly integrating distributional value estimation with reward-adaptive scaling to co-constrain weight and feature norms, thereby significantly improving gradient stability. Our key contributions are: (1) a norm-constrained optimization framework driven by hyperspherical normalization, ensuring bounded parameter and activation magnitudes; and (2) a distributed value estimation method with dynamic reward scaling, effectively mitigating scalability bottlenecks under non-stationary data regimes. Evaluated across four domains and 57 continuous control benchmarks, SimbaV2 consistently surpasses state-of-the-art methods, delivering substantial gains in both training stability and generalization performance for large-scale DRL models.

Technology Category

Application Category

📝 Abstract
Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains. The code is available at https://dojeon-ai.github.io/SimbaV2.
Problem

Research questions and friction points this paper is trying to address.

Stabilize optimization in reinforcement learning
Prevent overfitting in large RL models
Maintain stable gradients with varying rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyperspherical normalization constrains weight growth
Distributional value estimation stabilizes gradients
Scalable reinforcement learning with larger models
🔎 Similar Papers
No similar papers found.
H
Hojoon Lee
KAIST
Y
Youngdo Lee
KAIST
Takuma Seno
Takuma Seno
Turing Inc.
Deep reinforcement learningDeep learning
D
Donghu Kim
KAIST
P
Peter Stone
UT Austin
J
J. Choo
KAIST