Optimizers Qualitatively Alter Solutions And We Should Leverage This

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This paper challenges the prevailing focus on convergence speed in deep neural network (DNN) optimization, arguing that optimizers fundamentally shape qualitative solution properties—such as generalization, robustness, and expressive capacity—acting as a critical source of *inductive bias*. Existing work overemphasizes efficiency while neglecting this structural role. To address this, we elevate the optimizer to a first-class design lever—on par with model architecture and data—and systematically analyze its qualitative influence on solution-space geometry. Leveraging nonlinear optimization theory and empirical dynamics of first-order methods (e.g., SGD), we demonstrate that distinct optimization trajectories yield solutions with markedly different properties. Our core contribution is a formal *bias modeling framework* for optimizers, enabling the principled design of novel optimizers endowed with *targeted inductive biases*. This framework provides both theoretical grounding and practical methodology for embedding learning priors directly into the optimization process. (149 words)

Technology Category

Application Category

📝 Abstract

Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when using optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.

Problem

Research questions and friction points this paper is trying to address.

Optimizers affect solution properties in DNNs

Current focus overlooks optimizer-induced inductive biases

Need to design optimizers for desired solution traits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizers alter DNN solutions' qualitative properties

Optimizers encode inductive biases in models

Design optimizers to shape solution properties

🔎 Similar Papers

Empirical Tests of Optimization Assumptions in Deep Learning