Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep learning optimizers face a fundamental trade-off between geometric adaptability and curvature exploitation: steepest descent adapts to diverse geometries but uses only first-order information, whereas quasi-Newton and adaptive methods incorporate curvature yet are constrained by the Frobenius norm, limiting generalization. This work proposes a unified framework based on preconditioner matrix norms, establishing for the first time necessary and sufficient conditions for affine invariance and scale invariance under generalized norms. Our analysis reveals intrinsic consistency among SGD, Adam, and Muon. Leveraging this insight, we design MuAdam and MuAdam-SANIA—optimizers that jointly integrate problem geometry and curvature information. Experiments demonstrate state-of-the-art or superior performance across diverse tasks. The code is publicly released, ensuring strong reproducibility and extensibility.

Technology Category

Application Category

📝 Abstract
Optimization lies at the core of modern deep learning, yet existing methods often face a fundamental trade-off between adapting to problem geometry and leveraging curvature utilization. Steepest descent algorithms adapt to different geometries through norm choices but remain strictly first-order, whereas quasi-Newton and adaptive optimizers incorporate curvature information but are restricted to Frobenius geometry, limiting their applicability across diverse architectures. In this work, we propose a unified framework generalizing steepest descent, quasi-Newton methods, and adaptive methods through the novel notion of preconditioned matrix norms. This abstraction reveals that widely used optimizers such as SGD and Adam, as well as more advanced approaches like Muon and KL-Shampoo, and recent hybrids including SOAP and SPlus, all emerge as special cases of the same principle. Within this framework, we provide the first systematic treatment of affine and scale invariance in the matrix-parameterized setting, establishing necessary and sufficient conditions under generalized norms. Building on this foundation, we introduce two new methods, $ exttt{MuAdam}$ and $ exttt{MuAdam-SANIA}$, which combine the spectral geometry of Muon with Adam-style preconditioning. Our experiments demonstrate that these optimizers are competitive with, and in some cases outperform, existing state-of-the-art methods. Our code is available at https://github.com/brain-lab-research/LIB/tree/quasi_descent
Problem

Research questions and friction points this paper is trying to address.

Unifying steepest descent, quasi-Newton and adaptive optimization methods
Addressing trade-off between geometry adaptation and curvature utilization
Establishing invariance conditions under generalized matrix-parameterized norms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework using preconditioned matrix norms
Combines steepest descent with quasi-Newton methods
Introduces MuAdam optimizers with spectral geometry
🔎 Similar Papers
No similar papers found.
Andrey Veprikov
Andrey Veprikov
Unknown affiliation
OptimizationMLDL
A
Arman Bolatov
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
S
Samuel Horváth
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Aleksandr Beznosikov
Aleksandr Beznosikov
PhD, Basic Research of Artificial Intelligence Lab
OptimizationMachine Learning
M
Martin Takáč
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
S
Slavomir Hanzely
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)