🤖 AI Summary
This work investigates the dynamic evolution of the largest Hessian eigenvalue during deep neural network training, focusing on the fundamental distinction between “sharpening–stabilization” behaviors under full-batch versus mini-batch optimization. Addressing high-dimensional regimes, we integrate random matrix theory, Neural Tangent Kernel (NTK) analysis, and Hessian spectral modeling to propose the novel “conservative sharpening” theory, which characterizes a mini-batch–induced deceleration mechanism in curvature growth. We establish, for the first time, that the stability boundary of mini-batch optimization is governed by the trace of the NTK—not by conventional Hessian eigenvalues—and empirically validate both the existence and controllability of this stochastic stability edge. Our findings yield new theoretical principles for understanding optimization dynamics in overparameterized settings and provide a rigorous foundation for designing robust training algorithms.
📝 Abstract
Recent empirical and theoretical work has shown that the dynamics of the large eigenvalues of the training loss Hessian have some remarkably robust features across models and datasets in the full batch regime. There is often an early period of progressive sharpening where the large eigenvalues increase, followed by stabilization at a predictable value known as the edge of stability. Previous work showed that in the stochastic setting, the eigenvalues increase more slowly - a phenomenon we call conservative sharpening. We provide a theoretical analysis of a simple high-dimensional model which shows the origin of this slowdown. We also show that there is an alternative stochastic edge of stability which arises at small batch size that is sensitive to the trace of the Neural Tangent Kernel rather than the large Hessian eigenvalues. We conduct an experimental study which highlights the qualitative differences from the full batch phenomenology, and suggests that controlling the stochastic edge of stability can help optimization.