An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces

📅 2025-02-04

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This paper addresses the challenge of Bayesian regret analysis for Thompson Sampling in bandit problems with infinite continuous action spaces. To this end, it extends the Russo–Van Roy information-theoretic framework and the Dong–Van Roy rate-distortion method to infinite-dimensional settings. By characterizing the geometric structure of the action space via metric complexity measures—such as covering numbers—and incorporating Lipschitz continuity of the reward function, the authors derive an explicit Bayesian regret upper bound that scales with the action space’s intrinsic complexity. This bound breaks the classical finite-action assumption and achieves near-optimal rates under Lipschitz conditions. The key contributions are: (1) a novel information-theoretic analytical paradigm tailored to infinite continuous action spaces; (2) the first explicit regret bound that jointly captures spatial geometry and regret growth; and (3) a scalable analytical tool for regret analysis in high-dimensional or continuous-control bandit problems.

Technology Category

Application Category

📝 Abstract

This paper studies the Bayesian regret of the Thompson Sampling algorithm for bandit problems, building on the information-theoretic framework introduced by Russo and Van Roy (2015). Specifically, it extends the rate-distortion analysis of Dong and Van Roy (2018), which provides near-optimal bounds for linear bandits. A limitation of these results is the assumption of a finite action space. We address this by extending the analysis to settings with infinite and continuous action spaces. Additionally, we specialize our results to bandit problems with expected rewards that are Lipschitz continuous with respect to the action space, deriving a regret bound that explicitly accounts for the complexity of the action space.

Problem

Research questions and friction points this paper is trying to address.

Extends Thompson Sampling analysis to infinite action spaces

Addresses Bayesian regret for continuous action bandit problems

Derives regret bounds for Lipschitz continuous reward functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Thompson Sampling analysis

Addresses infinite action spaces

Derives regret for Lipschitz rewards

🔎 Similar Papers

Diffusion Models Meet Contextual Bandits with Large Action Spaces