🤖 AI Summary
Deepening neural networks often causes gradient explosion or vanishing, rooted in spectral instability of the input-output Jacobian matrix. Existing stability theories are limited to fully connected architectures and i.i.d. weight assumptions, failing to characterize sparsity induced by pruning or weak dependencies among weights arising from training.
Method: We develop a general Jacobian spectral stability theorem applicable to sparse connectivity and non-i.i.d. (weakly dependent) weights, integrating random matrix theory, dependency modeling, and structured spectral analysis to formulate a more realistic initialization framework.
Contribution/Results: Our theory provides rigorous, verifiable spectral stability guarantees for both pruned models and post-training networks—addressing critical gaps left by classical initialization theory. It significantly extends the applicability boundary of deep network initialization theory, enabling principled design and analysis of modern sparse and trained architectures.
📝 Abstract
Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.