🤖 AI Summary
In overparameterized learning, it remains unclear which global minima are dynamically stable and thus practically attainable by stochastic gradient descent (SGD) versus deterministic gradient descent.
Method: The authors introduce the Lyapunov exponent—derived from local dynamical systems theory—as a rigorous stability criterion for minima, modeling SGD via stochastic differential equations and analyzing its non-convex optimization dynamics through Lyapunov stability theory.
Contribution/Results: They establish the first theoretically grounded dynamic stability criterion for SGD: the sign of the Lyapunov exponent determines whether SGD concentrates near a given minimum. Crucially, they prove that implicit regularization in overparameterized settings arises precisely from this dynamic stability selection mechanism—i.e., SGD preferentially converges to minima with negative Lyapunov exponents. This framework provides a novel dynamical-systems perspective on generalization bias in overparameterized models, unifying implicit regularization with stability of stochastic optimization trajectories.
📝 Abstract
For overparameterized optimization tasks, such as the ones found in modern machine learning, global minima are generally not unique. In order to understand generalization in these settings, it is vital to study to which minimum an optimization algorithm converges. The possibility of having minima that are unstable under the dynamics imposed by the optimization algorithm limits the potential minima that the algorithm can find. In this paper, we characterize the global minima that are dynamically stable/unstable for both deterministic and stochastic gradient descent (SGD). In particular, we introduce a characteristic Lyapunov exponent which depends on the local dynamics around a global minimum and rigorously prove that the sign of this Lyapunov exponent determines whether SGD can accumulate at the respective global minimum.