Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning

📅 2024-07-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

In overparameterized learning, it remains unclear which global minima are dynamically stable and thus practically attainable by stochastic gradient descent (SGD) versus deterministic gradient descent. Method: The authors introduce the Lyapunov exponent—derived from local dynamical systems theory—as a rigorous stability criterion for minima, modeling SGD via stochastic differential equations and analyzing its non-convex optimization dynamics through Lyapunov stability theory. Contribution/Results: They establish the first theoretically grounded dynamic stability criterion for SGD: the sign of the Lyapunov exponent determines whether SGD concentrates near a given minimum. Crucially, they prove that implicit regularization in overparameterized settings arises precisely from this dynamic stability selection mechanism—i.e., SGD preferentially converges to minima with negative Lyapunov exponents. This framework provides a novel dynamical-systems perspective on generalization bias in overparameterized models, unifying implicit regularization with stability of stochastic optimization trajectories.

Technology Category

Application Category

📝 Abstract

For overparameterized optimization tasks, such as the ones found in modern machine learning, global minima are generally not unique. In order to understand generalization in these settings, it is vital to study to which minimum an optimization algorithm converges. The possibility of having minima that are unstable under the dynamics imposed by the optimization algorithm limits the potential minima that the algorithm can find. In this paper, we characterize the global minima that are dynamically stable/unstable for both deterministic and stochastic gradient descent (SGD). In particular, we introduce a characteristic Lyapunov exponent which depends on the local dynamics around a global minimum and rigorously prove that the sign of this Lyapunov exponent determines whether SGD can accumulate at the respective global minimum.

Problem

Research questions and friction points this paper is trying to address.

Characterize stability of global minima in overparameterized learning

Determine which minima SGD converges to dynamically

Introduce Lyapunov exponent to predict SGD stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Characterizes stable minima for SGD dynamics

Introduces Lyapunov exponent for stability analysis

Proves sign determines SGD convergence behavior

🔎 Similar Papers

Role of Momentum in Smoothing Objective Function and Generalizability of Deep Neural Networks