Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This paper investigates the trade-off between convergence rate and robustness of generalized momentum methods—including Nesterov acceleration, heavy-ball, and gradient descent—under biased and adversarial stochastic gradient errors. We develop the first non-asymptotic risk-sensitive analysis framework, establishing rigorous theoretical connections among the risk-sensitive index (RSI), the H∞ norm, and the large-deviation rate function. Leveraging a 2×2 Riccati equation, convex conjugate analysis, and the large-deviation principle, we characterize the finite-time behavior of optimization dynamics under Gaussian and sub-Gaussian noise. For strongly convex smooth and quadratic objectives, we derive explicit Pareto-optimal robustness–convergence trade-off boundaries, accompanied by high-probability convergence guarantees and sharp tail-probability decay bounds. Numerical experiments validate the theoretical predictions.

Technology Category

Application Category

📝 Abstract

We study trade-offs between convergence rate and robustness to gradient errors in first-order methods. Our focus is on generalized momentum methods (GMMs), a class that includes Nesterov's accelerated gradient, heavy-ball, and gradient descent. We allow stochastic gradient errors that may be adversarial and biased, and quantify robustness via the risk-sensitive index (RSI) from robust control theory. For quadratic objectives with i.i.d. Gaussian noise, we give closed-form expressions for RSI using 2x2 Riccati equations, revealing a Pareto frontier between RSI and convergence rate over stepsize and momentum choices. We prove a large-deviation principle for time-averaged suboptimality and show that the rate function is, up to scaling, the convex conjugate of the RSI. We further connect RSI to the $H_{infty}$-norm, showing that stronger worst-case robustness (smaller $H_{infty}$ norm) yields sharper decay of tail probabilities. Beyond quadratics, under biased sub-Gaussian gradient errors, we derive non-asymptotic bounds on a finite-time analogue of the RSI, giving finite-time high-probability guarantees and large-deviation bounds. We also observe an analogous trade-off between RSI and convergence-rate bounds for smooth strongly convex functions. To our knowledge, these are the first non-asymptotic guarantees and risk-sensitive analysis of GMMs with biased gradients. Numerical experiments on robust regression illustrate the results.

Problem

Research questions and friction points this paper is trying to address.

Analyzing convergence-robustness trade-offs in gradient methods

Studying biased adversarial gradient errors in momentum algorithms

Providing non-asymptotic risk-sensitive guarantees for optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Biased gradient error robustness analysis

Risk-sensitive index via Riccati equations

Non-asymptotic guarantees for momentum methods

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation