Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

📅 2025-02-11

🏛️ Neural Information Processing Systems

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This paper investigates the global convergence of stochastic gradient multi-armed bandit (SG-Bandit) algorithms under arbitrary constant learning rates, focusing on non-ideal settings where standard smoothness and bounded-noise assumptions fail. Methodologically, it integrates tools from stochastic optimization, bandit theory, and probabilistic convergence analysis. The key contribution is the first rigorous proof that SG-Bandit converges almost surely to the globally optimal policy—even without smoothness, under non-stationary noise, and with only weak noise control—thereby eliminating reliance on learning-rate decay or strong regularity conditions. Crucially, the analysis uncovers an intrinsic balance between action sampling rates and the cumulative progress-to-noise ratio, which governs convergence behavior. This result substantially extends the theoretical applicability of stochastic gradient methods to high-noise, nonsmooth bandit environments.

Technology Category

Application Category

📝 Abstract

We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using emph{any} constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down. The proofs are based on novel findings about action sampling rates and the relationship between cumulative progress and noise, and extend the current understanding of how simple stochastic gradient methods behave in bandit settings.

Problem

Research questions and friction points this paper is trying to address.

Global convergence of stochastic gradient bandits

Arbitrary constant learning rates

Balancing exploration and exploitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Global convergence

Arbitrary learning rates

Action sampling rates

🔎 Similar Papers

No similar papers found.