Convergence of optimizers implies eigenvalues filtering at equilibrium

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the mechanistic basis for optimizer preference toward flat minima in deep neural networks, centered on the hypothesis that optimizer convergence behavior is governed by its hyperparameters. Method: We establish that, at equilibrium, optimizers implicitly perform Hessian eigenvalue filtering—where hyperparameter choices bias convergence toward either sharp or flat minima. Leveraging a generalized Hadamard–Perron stable manifold theorem, we develop a theoretical framework applicable to semialgebraic C² functions and design two novel algorithms that explicitly enhance preference for wide, flat minima. Contribution/Results: The proposed methods demonstrably outperform standard gradient descent and Sharpness-Aware Minimization (SAM) in numerical experiments, achieving improved generalization while circumventing limitations of conventional stability analysis. Our framework provides a rigorous, geometrically grounded explanation for hyperparameter-dependent flat-minima selection and enables principled algorithmic design for enhanced generalization.

Technology Category

Application Category

📝 Abstract
Ample empirical evidence in deep neural network training suggests that a variety of optimizers tend to find nearly global optima. In this article, we adopt the reversed perspective that convergence to an arbitrary point is assumed rather than proven, focusing on the consequences of this assumption. From this viewpoint, in line with recent advances on the edge-of-stability phenomenon, we argue that different optimizers effectively act as eigenvalue filters determined by their hyperparameters. Specifically, the standard gradient descent method inherently avoids the sharpest minima, whereas Sharpness-Aware Minimization (SAM) algorithms go even further by actively favoring wider basins. Inspired by these insights, we propose two novel algorithms that exhibit enhanced eigenvalue filtering, effectively promoting wider minima. Our theoretical analysis leverages a generalized Hadamard--Perron stable manifold theorem and applies to general semialgebraic $C^2$ functions, without requiring additional non-degeneracy conditions or global Lipschitz bound assumptions. We support our conclusions with numerical experiments on feed-forward neural networks.
Problem

Research questions and friction points this paper is trying to address.

Optimizers converge to global optima in neural networks
Different optimizers act as eigenvalue filters on minima
Novel algorithms promote wider minima through eigenvalue filtering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizers act as eigenvalue filters
Novel algorithms promote wider minima
Generalized Hadamard-Perron theorem enables analysis
🔎 Similar Papers
No similar papers found.
J
Jerome Bolte
Toulouse School of Economics, Université Toulouse Capitole, Toulouse, France
Q
Quoc-Tung Le
Univ. Grenoble Alpes, CNRS, LJK, Grenoble, France
Edouard Pauwels
Edouard Pauwels
Toulouse School of Economics
optimizationmachine learning