HVAdam: A Full-Dimension Adaptive Optimizer

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Adaptive optimizers (e.g., Adam) excel in large-scale models but often underperform SGD on classical architectures like CNNs, primarily due to rigid preconditioning that limits scenario-specific adaptability. To address this, we propose Anon—a novel optimizer that unifies the strengths of SGD and Adam via a tunable adaptivity mechanism, enabling continuous interpolation and extrapolation between SGD-like and Adam-like behavior. We further introduce Incremental Delayed Updates (IDU) to enhance training stability. Theoretically, we establish convergence guarantees for Anon under both convex and non-convex settings. Empirically, Anon consistently outperforms state-of-the-art optimizers across diverse tasks—including image classification, diffusion modeling, and language modeling—demonstrating the efficacy of *adaptivity as a controllable design principle*.

Technology Category

Application Category

📝 Abstract

Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity , allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness to gradient noise. We theoretically establish convergence guarantees under both convex and non-convex settings. Empirically, Anon consistently outperforms state-of-the-art optimizers on representative image classification, diffusion, and language modeling tasks. These results demonstrate that adaptivity can serve as a valuable tunable design principle, and Anon provides the first unified and reliable framework capable of bridging the gap between classical and modern optimizers and surpassing their advantageous properties.

Problem

Research questions and friction points this paper is trying to address.

Addresses poor generalization of adaptive optimizers compared to non-adaptive methods

Enables tunable adaptivity to bridge SGD-like and Adam-like optimization behaviors

Provides convergence guarantees across adaptivity spectrum for diverse optimization landscapes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tunable adaptivity optimizer bridging SGD and Adam

Incremental delay update for enhanced convergence

Unified framework outperforming classical and modern optimizers

🔎 Similar Papers

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization