π€ AI Summary
This work addresses the long-standing gap in theoretical understanding of Ensemble Sampling (ES) for linear stochastic bandits, where tight high-probability regret bounds have remained elusive compared to Thompson Sampling. By modeling the exploration mechanism of discrete-time ES as a time-uniform boundary-crossing problem involving multiple independent Brownian motions, we leverage tools from stochastic process theory, high-dimensional probability, and time-uniform concentration inequalities to establish a sharp regret upper bound. We prove that with an ensemble size of Ξ(d log n), ES achieves a high-probability regret bound of Γ(d^{3/2}βn) while maintaining computational complexity comparable to Thompson Sampling, thereby closing the theoretical gap between the two methods. Our analysis further highlights the naturalness and necessity of a continuous-time perspective in deriving such results.
π Abstract
We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size $m=\Theta(d\log n)$, ES attains $\tilde O(d^{3/2}\sqrt n)$ high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for $m$ independent Brownian motions. Intriguingly, this continuous-time lens is not forced; it appears natural--and perhaps necessary: the discrete-time problem seems to be asking for a continuous-time solution, and we know of no other way to obtain a sharp ES bound.