🤖 AI Summary
This paper studies the generalized linear contextual bandit problem under limited adaptivity, where the number of policy updates is constrained by a budget $M$, aiming to minimize regret. For both fixed- and adaptive-update settings, we propose B-GLinCB—employing batched updates and a novel confidence-region construction—and RS-GLinCB—which incorporates randomization and a tighter, $kappa$-free analysis of the nonlinearity parameter. Our work is the first to eliminate dependence on the nonlinearity constant $kappa$ in generalized linear bandits, achieving a unified $ ilde{O}(sqrt{T})$ regret bound that holds for both stochastic and adversarial context vectors. When $M = Omega(log log T)$, our algorithms attain optimal regret. Notably, RS-GLinCB requires only $ ilde{O}(log^2 T)$ updates, drastically reducing update frequency while preserving theoretical guarantees.
📝 Abstract
We study the generalized linear contextual bandit problem within the constraints of limited adaptivity. In this paper, we present two algorithms, $ exttt{B-GLinCB}$ and $ exttt{RS-GLinCB}$, that address, respectively, two prevalent limited adaptivity settings. Given a budget $M$ on the number of policy updates, in the first setting, the algorithm needs to decide upfront $M$ rounds at which it will update its policy, while in the second setting it can adaptively perform $M$ policy updates during its course. For the first setting, we design an algorithm $ exttt{B-GLinCB}$, that incurs $ ilde{O}(sqrt{T})$ regret when $M = Omega( log{log T} )$ and the arm feature vectors are generated stochastically. For the second setting, we design an algorithm $ exttt{RS-GLinCB}$ that updates its policy $ ilde{O}(log^2 T)$ times and achieves a regret of $ ilde{O}(sqrt{T})$ even when the arm feature vectors are adversarially generated. Notably, in these bounds, we manage to eliminate the dependence on a key instance dependent parameter $kappa$, that captures non-linearity of the underlying reward model. Our novel approach for removing this dependence for generalized linear contextual bandits might be of independent interest.