From SGD to Spectra: A Theory of Neural Network Weight Dynamics

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Understanding the theoretical mechanisms governing the dynamic evolution of singular value spectra of weight matrices during deep neural network training—particularly the mapping between microscopic SGD dynamics and macroscopic spectral structure—remains an open challenge. Method: We propose a continuous-time modeling framework based on matrix-valued stochastic differential equations (SDEs), enabling rigorous analysis of weight matrix dynamics. Contribution/Results: We theoretically derive, for the first time, that the squared singular values evolve according to Dyson Brownian motion with repulsion, and we rigorously establish their stationary distribution as a Gamma-type distribution with power-law tails—explaining the empirically ubiquitous “bulk+tail” spectral structure. Integrating random matrix theory and spectral analysis, our predictions quantitatively match observed spectral evolution across MLPs and Transformers. This work provides the first analytically tractable and empirically verifiable spectral evolution theory for deep learning training dynamics.

Technology Category

Application Category

📝 Abstract
Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear-we develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects the microscopic dynamics of SGD to the macroscopic evolution of singular-value spectra in weight matrices. We derive exact SDEs showing that squared singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails, providing the first theoretical explanation for the empirically observed 'bulk+tail' spectral structure in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a rigorous foundation for understanding why deep learning works.
Problem

Research questions and friction points this paper is trying to address.

Modeling neural network weight dynamics via continuous-time SDEs
Explaining bulk+tail spectral structure in trained networks
Validating theoretical predictions with transformer and MLP experiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous-time matrix-valued SDE framework
Dyson Brownian motion for singular values
Gamma-type densities explain spectral structure
🔎 Similar Papers
2024-05-25arXiv.orgCitations: 3
B
Brian Richard Olsen
California Institute of Technology, Pasadena California, USA
S
Sam Fatehmanesh
California Institute of Technology, Pasadena California, USA
Frank Xiao
Frank Xiao
Caltech
Machine Learning
Adarsh Kumarappan
Adarsh Kumarappan
Unknown affiliation
A
Anirudh Gajula
California Institute of Technology, Pasadena California, USA