Optimizers Qualitatively Alter Solutions And We Should Leverage This

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper challenges the prevailing focus on convergence speed in deep neural network (DNN) optimization, arguing that optimizers fundamentally shape qualitative solution properties—such as generalization, robustness, and expressive capacity—acting as a critical source of *inductive bias*. Existing work overemphasizes efficiency while neglecting this structural role. To address this, we elevate the optimizer to a first-class design lever—on par with model architecture and data—and systematically analyze its qualitative influence on solution-space geometry. Leveraging nonlinear optimization theory and empirical dynamics of first-order methods (e.g., SGD), we demonstrate that distinct optimization trajectories yield solutions with markedly different properties. Our core contribution is a formal *bias modeling framework* for optimizers, enabling the principled design of novel optimizers endowed with *targeted inductive biases*. This framework provides both theoretical grounding and practical methodology for embedding learning priors directly into the optimization process. (149 words)

Technology Category

Application Category

📝 Abstract
Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when using optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.
Problem

Research questions and friction points this paper is trying to address.

Optimizers affect solution properties in DNNs
Current focus overlooks optimizer-induced inductive biases
Need to design optimizers for desired solution traits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizers alter DNN solutions' qualitative properties
Optimizers encode inductive biases in models
Design optimizers to shape solution properties
🔎 Similar Papers
No similar papers found.