đ€ AI Summary
Deep reinforcement learning (DRL) models often suffer significant performance degradation upon scaling, a long-standing issue whose root cause remained unclear. Method: This work systematically identifies the coupling of non-stationary environment dynamics and neural network architecture as the primary source of gradient pathologiesâincluding explosion, vanishing, and directional misalignmentâthat undermine scalability. To address this, we propose a lightweight, plug-and-play Gradient Flow Stabilization (GFS) framework comprising three components: gradient normalization, path-sensitive weight initialization, and temporal-aware gradient clippingâfully compatible with mainstream algorithms such as DQN and PPO. Contribution/Results: Evaluated on Atari and DeepMind Control benchmarks, GFS enables 4Ă depth and 3Ă width scaling without performance lossâyielding consistent gains. It improves training stability by 62% and accelerates convergence by 1.8Ă, establishing an interpretable, reusable mechanism for scalable, stable DRL.
đ Abstract
Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure mode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.