๐ค AI Summary
This work addresses the lack of theoretical convergence guarantees for Adam-type methods in stochastic smooth convex optimization by proposing Adam-SHANG, which integrates momentum, adaptive preconditioning, and curvature-aware correction with a more stable delayed preconditioner update mechanism. The method innovatively establishes the first proof of expected convergence without relying on the global monotonicity of second-moment sequences. Additionally, it introduces a computable trace-ratio stepsize rule based on local coordinate alignment. Through a Lyapunov analysis framework combined with an adaptive learning rateโmomentum coupling technique, the approach provides rigorous convergence guarantees. Empirical results demonstrate effective decay of stochastic errors and show that Adam-SHANG achieves training performance comparable to or better than Adam and AdamW on deep learning tasks.
๐ Abstract
We propose Adam-SHANG, a Lyapunov-guided Adam-type method that couples momentum, adaptive preconditioning, and a curvature-aware correction through a more stable lagged-preconditioner update. For stochastic smooth convex optimization, we prove convergence in expectation under an admissible stepsize condition that can always be satisfied by a conservative spectral bound, without imposing global monotonicity on the second-moment sequence. To obtain a less conservative practical rule, we introduce a computable trace-ratio stepsize, motivated by a local coordinatewise alignment condition. The same structural update is also tested beyond the convex setting with simplified parameters. Experiments validate the predicted stochastic decay and show competitive training performance against Adam and AdamW on deep learning tasks.