🤖 AI Summary
This work addresses slow convergence and susceptibility to stationary points in both convex and smooth nonconvex optimization. We propose HOME-3, a higher-order momentum estimator based on the cubic power of first-order gradients. To our knowledge, this is the first systematic incorporation of third-power gradient terms into momentum construction, coupled with a gradient-weighted update mechanism to enhance directional discrimination. Theoretically, we employ Lyapunov function analysis to establish tightened convergence bounds and extend the framework to nonsmooth nonconvex settings. Empirical evaluations demonstrate that HOME-3 consistently outperforms mainstream optimizers—including Adam and SGD with momentum—across convex optimization, smooth and nonsmooth nonconvex tasks (e.g., deep neural network training). It achieves up to 2.1× faster convergence, improved generalization stability, and superior saddle-point escape capability.
📝 Abstract
Momentum-based gradients are essential for optimizing advanced machine learning models, as they not only accelerate convergence but also advance optimizers to escape stationary points. While most state-of-the-art momentum techniques utilize lower-order gradients, such as the squared first-order gradient, there has been limited exploration of higher-order gradients, particularly those raised to powers greater than two. In this work, we introduce the concept of high-order momentum, where momentum is constructed using higher-power gradients, with a focus on the third-power of the first-order gradient as a representative case. Our research offers both theoretical and empirical support for this approach. Theoretically, we demonstrate that incorporating third-power gradients can improve the convergence bounds of gradient-based optimizers for both convex and smooth nonconvex problems. Empirically, we validate these findings through extensive experiments across convex, smooth nonconvex, and nonsmooth nonconvex optimization tasks. Across all cases, high-order momentum consistently outperforms conventional low-order momentum methods, showcasing superior performance in various optimization problems.