Ensemble sampling for linear bandits: small ensembles suffice

📅 2023-11-14
🏛️ arXiv.org
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
In stochastic linear multi-armed bandits with infinite action sets and finite ensemble sizes, existing methods suffer from high computational overhead due to ensemble scaling linearly with time horizon $T$. Method: This paper proposes a novel ensemble sampling framework that integrates Bayesian posterior approximation, linear function approximation, and concentration inequality analysis. Contribution/Results: Differing from conventional approaches requiring $O(T)$ base learners, our method is the first to achieve a lightweight ensemble of only $O(d log T)$ learners in structured bandits—breaking the linear dependence on $T$. We establish a regret upper bound of $ ilde{O}((d log T)^{5/2} sqrt{T})$ in $d$-dimensional linear environments over horizon $T$, approaching the optimal $ ilde{O}(sqrt{T})$ benchmark. The framework naturally accommodates infinite action spaces and significantly improves computational efficiency, offering a new paradigm for scalable, high-dimensional, long-horizon online decision-making under large or infinite action sets.
📝 Abstract
We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $d log T$ incurs regret at most of the order $(d log T)^{5/2} sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $smash{sqrt{T}}$ order regret. Our result is also the first to allow for infinite action sets.
Problem

Research questions and friction points this paper is trying to address.

Multi-Armed Bandit
Regret Minimization
Randomized Linear Games
Innovation

Methods, ideas, or system contributions that make the work stand out.

Set Sampling
Regret Analysis
Linear Bandits
🔎 Similar Papers
No similar papers found.
David Janz
David Janz
University of Oxford
statisticsmachine learningreinforcement learning
A
A. Litvak
University of Alberta
C
Csaba Szepesvári
University of Alberta