A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work establishes a theoretical connection between adaptive optimizers and normalized steepest descent (NSD) in non-Euclidean spaces, revealing fundamental convergence differences. To address both non-convex and convex optimization, we replace the standard smoothness assumption with *adaptive smoothness* and introduce *adaptive gradient variance* to characterize noise structure in stochastic settings. Based on these, we develop a unified non-Euclidean geometric convergence analysis framework. Our key contributions are: (1) the first proof that adaptive optimizers can achieve Nesterov-type momentum acceleration in non-Euclidean spaces—unattainable under standard smoothness; and (2) dimension-free convergence rates under specific non-Euclidean metrics. These results provide a unified explanation for the empirical superiority of adaptive methods across diverse geometries and stochastic regimes, offering a new theoretical paradigm for understanding and designing adaptive optimization algorithms.

Technology Category

Application Category

📝 Abstract
Adaptive optimizers can reduce to normalized steepest descent (NSD) when only adapting to the current gradient, suggesting a close connection between the two algorithmic families. A key distinction between their analyses, however, lies in the geometries, e.g., smoothness notions, they rely on. In the convex setting, adaptive optimizers are governed by a stronger adaptive smoothness condition, while NSD relies on the standard notion of smoothness. We extend the theory of adaptive smoothness to the nonconvex setting and show that it precisely characterizes the convergence of adaptive optimizers. Moreover, we establish that adaptive smoothness enables acceleration of adaptive optimizers with Nesterov momentum in the convex setting, a guarantee unattainable under standard smoothness for certain non-Euclidean geometry. We further develop an analogous comparison for stochastic optimization by introducing adaptive gradient variance, which parallels adaptive smoothness and leads to dimension-free convergence guarantees that cannot be achieved under standard gradient variance for certain non-Euclidean geometry.
Problem

Research questions and friction points this paper is trying to address.

Extends adaptive smoothness theory to nonconvex optimization settings
Characterizes convergence of adaptive optimizers using geometry distinctions
Develops stochastic analysis with adaptive gradient variance for non-Euclidean geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive smoothness extends to nonconvex optimization
Adaptive smoothness enables Nesterov momentum acceleration
Adaptive gradient variance ensures dimension-free convergence guarantees
🔎 Similar Papers
No similar papers found.
Shuo Xie
Shuo Xie
Toyota Technological Institute at Chicago
machine learningoptimization
T
Tianhao Wang
University of California, San Diego
B
Beining Wu
University of Chicago
Z
Zhiyuan Li
Toyota Technological Institute at Chicago