🤖 AI Summary
This paper investigates the trade-off between convergence rate and robustness of generalized momentum methods—including Nesterov acceleration, heavy-ball, and gradient descent—under biased and adversarial stochastic gradient errors. We develop the first non-asymptotic risk-sensitive analysis framework, establishing rigorous theoretical connections among the risk-sensitive index (RSI), the H∞ norm, and the large-deviation rate function. Leveraging a 2×2 Riccati equation, convex conjugate analysis, and the large-deviation principle, we characterize the finite-time behavior of optimization dynamics under Gaussian and sub-Gaussian noise. For strongly convex smooth and quadratic objectives, we derive explicit Pareto-optimal robustness–convergence trade-off boundaries, accompanied by high-probability convergence guarantees and sharp tail-probability decay bounds. Numerical experiments validate the theoretical predictions.
📝 Abstract
We study trade-offs between convergence rate and robustness to gradient errors in first-order methods. Our focus is on generalized momentum methods (GMMs), a class that includes Nesterov's accelerated gradient, heavy-ball, and gradient descent. We allow stochastic gradient errors that may be adversarial and biased, and quantify robustness via the risk-sensitive index (RSI) from robust control theory. For quadratic objectives with i.i.d. Gaussian noise, we give closed-form expressions for RSI using 2x2 Riccati equations, revealing a Pareto frontier between RSI and convergence rate over stepsize and momentum choices. We prove a large-deviation principle for time-averaged suboptimality and show that the rate function is, up to scaling, the convex conjugate of the RSI. We further connect RSI to the $H_{infty}$-norm, showing that stronger worst-case robustness (smaller $H_{infty}$ norm) yields sharper decay of tail probabilities. Beyond quadratics, under biased sub-Gaussian gradient errors, we derive non-asymptotic bounds on a finite-time analogue of the RSI, giving finite-time high-probability guarantees and large-deviation bounds. We also observe an analogous trade-off between RSI and convergence-rate bounds for smooth strongly convex functions. To our knowledge, these are the first non-asymptotic guarantees and risk-sensitive analysis of GMMs with biased gradients. Numerical experiments on robust regression illustrate the results.