🤖 AI Summary
This work addresses the longstanding trade-off between generalization and robustness in deep learning optimizers: AdamW exhibits strong robustness but suboptimal generalization, whereas momentum SGD achieves superior generalization yet remains highly sensitive to hyperparameters and gradient scales. To reconcile these opposing strengths, we propose Ada2MS, a novel optimization algorithm that introduces a continuous exponential interpolation mechanism. Ada2MS dynamically blends element-wise and global second-moment estimates via exponential weighting within a unified framework, enabling smooth transitions between AdamW-like and momentum SGD-like behaviors by adaptively combining local and global learning rate characteristics. Extensive experiments demonstrate that Ada2MS matches or surpasses state-of-the-art optimizers on standard vision benchmarks, effectively achieving both robustness and strong generalization simultaneously.
📝 Abstract
Optimization algorithms are core methods by which machine learning models iteratively minimize loss functions, update parameters, learn from data, and improve performance. Momentum SGD and AdamW represent two important optimization paradigms. AdamW produces stable updates and usually has strong robustness across training scenarios, but its generalization performance is sometimes weaker than that of momentum methods. Momentum SGD can often obtain better generalization after careful tuning, but it is more sensitive to gradient-scale variation and hyperparameter settings. To balance the strengths and weaknesses of the two paradigms, this paper proposes Ada2MS, an optimization algorithm that achieves a smooth transition between AdamW-like behavior and momentum-SGD-like behavior through continuous exponential interpolation between elementwise second-moment estimates and global second-moment estimates. On the visual tasks evaluated in this study, Ada2MS obtains competitive results under a unified optimizer-comparison protocol. The code will be released at https://github.com/mengzhu0308/Ada2MS