On the Stability of the Jacobian Matrix in Deep Neural Networks

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Deepening neural networks often causes gradient explosion or vanishing, rooted in spectral instability of the input-output Jacobian matrix. Existing stability theories are limited to fully connected architectures and i.i.d. weight assumptions, failing to characterize sparsity induced by pruning or weak dependencies among weights arising from training. Method: We develop a general Jacobian spectral stability theorem applicable to sparse connectivity and non-i.i.d. (weakly dependent) weights, integrating random matrix theory, dependency modeling, and structured spectral analysis to formulate a more realistic initialization framework. Contribution/Results: Our theory provides rigorous, verifiable spectral stability guarantees for both pruned models and post-training networks—addressing critical gaps left by classical initialization theory. It significantly extends the applicability boundary of deep network initialization theory, enabling principled design and analysis of modern sparse and trained architectures.

Technology Category

Application Category

📝 Abstract

Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.

Problem

Research questions and friction points this paper is trying to address.

Analyzing Jacobian stability in deep neural networks

Extending stability theory to sparse and non-i.i.d. weights

Providing spectral guarantees for modern network architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

General stability theorem for deep neural networks

Accommodates sparsity and non-i.i.d. weights

Uses random matrix theory for spectral stability

🔎 Similar Papers

Spike No More: Stabilizing the Pre-training of Large Language Models