Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generalized Linear Bandits (GLBs) model non-Gaussian rewards—e.g., Bernoulli or Poisson—via nonlinear link functions, but their nonlinearity impedes simultaneous statistical and computational efficiency: existing methods either achieve optimal regret at high per-round cost or enable efficient updates at the expense of statistical suboptimality. This paper proposes the first GLB algorithm that attains both statistical and computational efficiency. It constructs a tight confidence set via online mirror descent and achieves near-optimal cumulative regret with only a single update per round. The theoretical analysis integrates maximum-likelihood estimation properties with a mixed-loss-based confidence bound. The algorithm incurs constant per-round time and space complexity—O(1)—thereby breaking the classical statistical-computational trade-off and significantly improving decision-making efficiency in high-dimensional contextual settings.

Technology Category

Application Category

📝 Abstract
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcal{O}(1)$ time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.
Problem

Research questions and friction points this paper is trying to address.

Achieving optimal regret in generalized linear bandits efficiently
Balancing computational and statistical efficiency in bandit algorithms
Developing one-pass update methods for online mirror descent
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-pass update with OMD estimator
Tight confidence set via mix loss
Nearly optimal regret with O(1) complexity
🔎 Similar Papers
No similar papers found.