HVAdam: A Full-Dimension Adaptive Optimizer

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Adaptive optimizers (e.g., Adam) excel in large-scale models but often underperform SGD on classical architectures like CNNs, primarily due to rigid preconditioning that limits scenario-specific adaptability. To address this, we propose Anon—a novel optimizer that unifies the strengths of SGD and Adam via a tunable adaptivity mechanism, enabling continuous interpolation and extrapolation between SGD-like and Adam-like behavior. We further introduce Incremental Delayed Updates (IDU) to enhance training stability. Theoretically, we establish convergence guarantees for Anon under both convex and non-convex settings. Empirically, Anon consistently outperforms state-of-the-art optimizers across diverse tasks—including image classification, diffusion modeling, and language modeling—demonstrating the efficacy of *adaptivity as a controllable design principle*.

Technology Category

Application Category

📝 Abstract
Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity , allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness to gradient noise. We theoretically establish convergence guarantees under both convex and non-convex settings. Empirically, Anon consistently outperforms state-of-the-art optimizers on representative image classification, diffusion, and language modeling tasks. These results demonstrate that adaptivity can serve as a valuable tunable design principle, and Anon provides the first unified and reliable framework capable of bridging the gap between classical and modern optimizers and surpassing their advantageous properties.
Problem

Research questions and friction points this paper is trying to address.

Addresses poor generalization of adaptive optimizers compared to non-adaptive methods
Enables tunable adaptivity to bridge SGD-like and Adam-like optimization behaviors
Provides convergence guarantees across adaptivity spectrum for diverse optimization landscapes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tunable adaptivity optimizer bridging SGD and Adam
Incremental delay update for enhanced convergence
Unified framework outperforming classical and modern optimizers
🔎 Similar Papers
No similar papers found.
Y
Yiheng Zhang
School of Computer Science, Wuhan University
Shaowu Wu
Shaowu Wu
Wuhan University
CV, ML
Y
Yuanzhuo Xu
School of Computer Science, Wuhan University
J
Jiajun Wu
Department of Electrical and Software Engineering, University of Calgary
S
Shang Xu
Department of Computer Science, University College London
Steve Drew
Steve Drew
Assistant Professor at University of Calgary
Edge AIIoTMachine Learning
X
Xiaoguang Niu
School of Computer Science, Wuhan University