🤖 AI Summary
Glorot initialization—designed under assumptions of infinite width and fixed sequence length—induces hidden-state explosion or vanishing in long-sequence linear RNNs; we prove that even a slight positive deviation in spectral radius triggers exponential instability when sequence length reaches $t = O(sqrt{n})$, where $n$ is the hidden dimension.
Method: We propose a dimension-aware Glorot rescaling initialization that actively constrains the spectral radius below unity, ensuring stability for long sequences.
Contribution/Results: Leveraging spectral analysis and random matrix theory, we rigorously establish that standard Glorot initialization inevitably destabilizes at $O(sqrt{n})$ lengths. Numerical experiments confirm our method substantially mitigates gradient explosion/vanishing, improves training stability, and enhances generalization performance—constituting the first theoretically grounded, stable recurrent initialization framework tailored for long sequences.
📝 Abstract
Proper initialization is critical for Recurrent Neural Networks (RNNs), particularly in long-range reasoning tasks, where repeated application of the same weight matrix can cause vanishing or exploding signals. A common baseline for linear recurrences is Glorot initialization, designed to ensure stable signal propagation--but derived under the infinite-width, fixed-length regime--an unrealistic setting for RNNs processing long sequences. In this work, we show that Glorot initialization is in fact unstable: small positive deviations in the spectral radius are amplified through time and cause the hidden state to explode. Our theoretical analysis demonstrates that sequences of length $t = O(sqrt{n})$, where $n$ is the hidden width, are sufficient to induce instability. To address this, we propose a simple, dimension-aware rescaling of Glorot that shifts the spectral radius slightly below one, preventing rapid signal explosion or decay. These results suggest that standard initialization schemes may break down in the long-sequence regime, motivating a separate line of theory for stable recurrent initialization.