Ensemble sampling for linear bandits: small ensembles suffice

📅 2023-11-14

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

265K/year

🤖 AI Summary

In stochastic linear multi-armed bandits with infinite action sets and finite ensemble sizes, existing methods suffer from high computational overhead due to ensemble scaling linearly with time horizon $T$. Method: This paper proposes a novel ensemble sampling framework that integrates Bayesian posterior approximation, linear function approximation, and concentration inequality analysis. Contribution/Results: Differing from conventional approaches requiring $O(T)$ base learners, our method is the first to achieve a lightweight ensemble of only $O(d log T)$ learners in structured bandits—breaking the linear dependence on $T$. We establish a regret upper bound of $ ilde{O}((d log T)^{5/2} sqrt{T})$ in $d$-dimensional linear environments over horizon $T$, approaching the optimal $ ilde{O}(sqrt{T})$ benchmark. The framework naturally accommodates infinite action spaces and significantly improves computational efficiency, offering a new paradigm for scalable, high-dimensional, long-horizon online decision-making under large or infinite action sets.

📝 Abstract

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $d log T$ incurs regret at most of the order $(d log T)^{5/2} sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $smash{sqrt{T}}$ order regret. Our result is also the first to allow for infinite action sets.

Problem

Research questions and friction points this paper is trying to address.

Multi-Armed Bandit

Regret Minimization

Randomized Linear Games

Innovation

Methods, ideas, or system contributions that make the work stand out.

Set Sampling

Regret Analysis

Linear Bandits

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits