An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces

📅 2025-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of Bayesian regret analysis for Thompson Sampling in bandit problems with infinite continuous action spaces. To this end, it extends the Russo–Van Roy information-theoretic framework and the Dong–Van Roy rate-distortion method to infinite-dimensional settings. By characterizing the geometric structure of the action space via metric complexity measures—such as covering numbers—and incorporating Lipschitz continuity of the reward function, the authors derive an explicit Bayesian regret upper bound that scales with the action space’s intrinsic complexity. This bound breaks the classical finite-action assumption and achieves near-optimal rates under Lipschitz conditions. The key contributions are: (1) a novel information-theoretic analytical paradigm tailored to infinite continuous action spaces; (2) the first explicit regret bound that jointly captures spatial geometry and regret growth; and (3) a scalable analytical tool for regret analysis in high-dimensional or continuous-control bandit problems.

Technology Category

Application Category

📝 Abstract
This paper studies the Bayesian regret of the Thompson Sampling algorithm for bandit problems, building on the information-theoretic framework introduced by Russo and Van Roy (2015). Specifically, it extends the rate-distortion analysis of Dong and Van Roy (2018), which provides near-optimal bounds for linear bandits. A limitation of these results is the assumption of a finite action space. We address this by extending the analysis to settings with infinite and continuous action spaces. Additionally, we specialize our results to bandit problems with expected rewards that are Lipschitz continuous with respect to the action space, deriving a regret bound that explicitly accounts for the complexity of the action space.
Problem

Research questions and friction points this paper is trying to address.

Extends Thompson Sampling analysis to infinite action spaces
Addresses Bayesian regret for continuous action bandit problems
Derives regret bounds for Lipschitz continuous reward functions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Thompson Sampling analysis
Addresses infinite action spaces
Derives regret for Lipschitz rewards
🔎 Similar Papers
No similar papers found.
Amaury Gouverneur
Amaury Gouverneur
KTH Royal Institute of Technology
Information theoryMachine Learning
B
Borja Rodriguez Galvez
Division of Information Science and Engineering (ISE), KTH Royal Institute of Technology
T
Tobias Oechtering
Division of Information Science and Engineering (ISE), KTH Royal Institute of Technology
Mikael Skoglund
Mikael Skoglund
KTH Royal Institute of Technology
Information TheoryCommunicationsSignal Processing