Spectral alignment of stochastic gradient descent for high-dimensional classification tasks

📅 2023-10-04
📈 Citations: 14
Influential: 4
📄 PDF
🤖 AI Summary
This work investigates the intrinsic relationship between SGD training dynamics and the spectral structure of empirical Hessian and gradient matrices in high-dimensional multiclass classification. Methodologically, it employs theoretical modeling and spectral analysis to characterize the evolution of these matrices throughout training. The key contribution is the first rigorous proof that, in deep multilayer networks, SGD trajectories align layerwise with the anomalous eigensubspaces—i.e., those spanned by top eigenvalues—of both the layer-wise Hessian and gradient matrices. Crucially, the rank of the final-layer anomalous subspace monotonically degenerates during optimization and becomes severely deficient upon convergence to a suboptimal classifier, serving as a spectral diagnostic for suboptimal convergence in overparameterized networks. This result is formally established for high-dimensional mixture models and single-/two-layer neural networks. Moreover, the work quantifies the direct link between the evolution of the final-layer anomalous subspace and classification accuracy, offering a novel spectral-geometric perspective on deep learning optimization dynamics.
📝 Abstract
We rigorously study the relation between the training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, both the SGD trajectory and emergent outlier eigenspaces of the Hessian and gradient matrices align with a common low-dimensional subspace. Moreover, in multi-layer settings this alignment occurs per layer, with the final layer's outlier eigenspace evolving over the course of training, and exhibiting rank deficiency when the SGD converges to sub-optimal classifiers. This establishes some of the rich predictions that have arisen from extensive numerical studies in the last decade about the spectra of Hessian and information matrices over the course of training in overparametrized networks.
Problem

Research questions and friction points this paper is trying to address.

Study SGD dynamics and Hessian spectra in high-dimensional classification
Analyze alignment of SGD trajectory with low-dimensional subspace
Examine layer-wise spectral alignment in multi-layer neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

SGD aligns with low-dimensional subspace
Hessian and gradient spectra show layer-wise alignment
Final layer eigenspace evolves during training