🤖 AI Summary
This paper addresses the challenge of Bayesian regret analysis for Thompson Sampling in bandit problems with infinite continuous action spaces. To this end, it extends the Russo–Van Roy information-theoretic framework and the Dong–Van Roy rate-distortion method to infinite-dimensional settings. By characterizing the geometric structure of the action space via metric complexity measures—such as covering numbers—and incorporating Lipschitz continuity of the reward function, the authors derive an explicit Bayesian regret upper bound that scales with the action space’s intrinsic complexity. This bound breaks the classical finite-action assumption and achieves near-optimal rates under Lipschitz conditions. The key contributions are: (1) a novel information-theoretic analytical paradigm tailored to infinite continuous action spaces; (2) the first explicit regret bound that jointly captures spatial geometry and regret growth; and (3) a scalable analytical tool for regret analysis in high-dimensional or continuous-control bandit problems.
📝 Abstract
This paper studies the Bayesian regret of the Thompson Sampling algorithm for bandit problems, building on the information-theoretic framework introduced by Russo and Van Roy (2015). Specifically, it extends the rate-distortion analysis of Dong and Van Roy (2018), which provides near-optimal bounds for linear bandits. A limitation of these results is the assumption of a finite action space. We address this by extending the analysis to settings with infinite and continuous action spaces. Additionally, we specialize our results to bandit problems with expected rewards that are Lipschitz continuous with respect to the action space, deriving a regret bound that explicitly accounts for the complexity of the action space.