How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
📝 Abstract
We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth $t$ grows jointly with width $n$. This question is especially relevant for modern recurrent sequence models, whose natural operating regime involves long input sequences, i.e., large $t$. We derive exact finite-width formulas for the hidden state signal energies in linear recurrences under complex Gaussian initialization. Using these formulas, we identify the joint depth-width scaling regimes that govern signal propagation: (i) a subcritical regime $t=o(\sqrt n)$, in which the infinite-width approximation remains valid; (ii) a critical regime $t\sim c\sqrt n$, in which non-negligible deviations from infinite-width predictions appear and a nontrivial joint scaling limit emerges; and (iii) a supercritical regime $t\gg \sqrt n$, in which finite-width effects dominate. Thus, our results pinpoint the precise recurrent depth scale at which infinite-width theory breaks down in long-range linear recurrences. In turn, this shows when standard initialization schemes, such as Glorot, become unstable. More broadly, our results demonstrate that finite-width effects accumulate more rapidly with depth in recurrent models than in feedforward ones, leading to qualitatively different signal propagation behavior.
Problem

Research questions and friction points this paper is trying to address.

signal propagation
finite width
recurrent models
infinite-width limit
depth-width scaling
Innovation

Methods, ideas, or system contributions that make the work stand out.

signal propagation
finite-width analysis
linear recurrences
depth-width scaling
infinite-width limit
🔎 Similar Papers