Revisiting Step-Size Assumptions in Stochastic Approximation

📅 2024-05-28

📈 Citations: 2

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This paper challenges the classical square-summability condition (∑αₙ² < ∞) on step sizes in stochastic approximation (SA), proving for the first time that it is not necessary for convergence. Focusing on power-law step sizes αₙ = α₀n⁻ᵽ with ρ ∈ (0,1) and general Markovian noise, it establishes fine-grained characterizations of convergence, bias, and variance. Key contributions are: (1) the first necessary and sufficient condition for vanishing bias when ρ ≤ 1/2; (2) a proof that Polyak–Ruppert averaging achieves the optimal CLT covariance even for ρ ∈ (0,1/2], though bias dominance slows convergence; and (3) almost-sure and Lₚ convergence for all ρ ∈ (0,1), with an explicit characterization of mean-square error degradation to O(αₙ²). Collectively, these results fundamentally reshape the theoretical foundations of SA step-size design.

Technology Category

Application Category

📝 Abstract

Many machine learning and optimization algorithms are built upon the framework of stochastic approximation (SA), for which the selection of step-size (or learning rate) is essential for success. For the sake of clarity, this paper focuses on the special case $alpha_n = alpha_0 n^{- ho}$ at iteration $n$, with $ ho in [0,1]$ and $alpha_0>0$ design parameters. It is most common in practice to take $ ho=0$ (constant step-size), while in more theoretically oriented papers a vanishing step-size is preferred. In particular, with $ ho in (1/2, 1)$ it is known that on applying the averaging technique of Polyak and Ruppert, the mean-squared error (MSE) converges at the optimal rate of $O(1/n)$ and the covariance in the central limit theorem (CLT) is minimal in a precise sense. The paper revisits step-size selection in a general Markovian setting. Under readily verifiable assumptions, the following conclusions are obtained provided $0< ho<1$: $ullet$ Parameter estimates converge with probability one, and also in $L_p$ for any $pge 1$. $ullet$ The MSE may converge very slowly for small $ ho$, of order $O(alpha_n^2)$ even with averaging. $ullet$ For linear stochastic approximation the source of slow convergence is identified: for any $ hoin (0,1)$, averaging results in estimates for which the error $ extit{covariance}$ vanishes at the optimal rate, and moreover the CLT covariance is optimal in the sense of Polyak and Ruppert. However, necessary and sufficient conditions are obtained under which the $ extit{bias}$ converges to zero at rate $O(alpha_n)$. This is the first paper to obtain such strong conclusions while allowing for $ ho le 1/2$. A major conclusion is that the choice of $ ho =0$ or even $ ho<1/2$ is justified only in select settings -- In general, bias may preclude fast convergence.

Problem

Research questions and friction points this paper is trying to address.

Challenges traditional step-size assumptions in stochastic approximation.

Explores convergence without square-summable step-size conditions.

Analyzes bias and convergence rates in Markovian noise settings.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Removes square summability assumption for convergence

Analyzes step-size αₙ=α₀n⁻ᵖ with ρ∈(0,1)

Obtains optimal covariance and faster bias decay

🔎 Similar Papers

No similar papers found.