Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Generalized Linear Bandits (GLBs) model non-Gaussian rewards—e.g., Bernoulli or Poisson—via nonlinear link functions, but their nonlinearity impedes simultaneous statistical and computational efficiency: existing methods either achieve optimal regret at high per-round cost or enable efficient updates at the expense of statistical suboptimality. This paper proposes the first GLB algorithm that attains both statistical and computational efficiency. It constructs a tight confidence set via online mirror descent and achieves near-optimal cumulative regret with only a single update per round. The theoretical analysis integrates maximum-likelihood estimation properties with a mixed-loss-based confidence bound. The algorithm incurs constant per-round time and space complexity—O(1)—thereby breaking the classical statistical-computational trade-off and significantly improving decision-making efficiency in high-dimensional contextual settings.

Technology Category

Application Category

📝 Abstract

We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcal{O}(1)$ time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.

Problem

Research questions and friction points this paper is trying to address.

Achieving optimal regret in generalized linear bandits efficiently

Balancing computational and statistical efficiency in bandit algorithms

Developing one-pass update methods for online mirror descent

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-pass update with OMD estimator

Tight confidence set via mix loss

Nearly optimal regret with O(1) complexity

🔎 Similar Papers

No similar papers found.

Amazon

Arlington, VA, USA / Bellevue, WA, USA / Boston, MA, USA

Liquid Labs - Research Engineer

Liquid AI

San Francisco, USA / Boston, USA

Research Engineer, Monetization AI