🤖 AI Summary
Generalized Linear Bandits (GLBs) model non-Gaussian rewards—e.g., Bernoulli or Poisson—via nonlinear link functions, but their nonlinearity impedes simultaneous statistical and computational efficiency: existing methods either achieve optimal regret at high per-round cost or enable efficient updates at the expense of statistical suboptimality. This paper proposes the first GLB algorithm that attains both statistical and computational efficiency. It constructs a tight confidence set via online mirror descent and achieves near-optimal cumulative regret with only a single update per round. The theoretical analysis integrates maximum-likelihood estimation properties with a mixed-loss-based confidence bound. The algorithm incurs constant per-round time and space complexity—O(1)—thereby breaking the classical statistical-computational trade-off and significantly improving decision-making efficiency in high-dimensional contextual settings.
📝 Abstract
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcal{O}(1)$ time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.