Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates Feel-Good Thompson Sampling (FG-TS) and its smoothed variant (SFG-TS) in contextual bandit settings—linear, logistic, and neural network—under both exact and approximate posterior inference (e.g., MCMC, stochastic gradient Bayesian sampling). To address insufficient exploration in high-dimensional regimes, FG-TS incorporates an optimistic reward mechanism that enhances exploration robustness. This is the first unified benchmarking effort assessing FG-TS variants under both exact and scalable approximate posteriors. Results show that FG-TS significantly outperforms standard Thompson Sampling in linear and logistic bandits, but exhibits diminished gains in neural network settings due to posterior approximation challenges. Empirically, FG-TS achieves strong overall performance with minimal implementation overhead, establishing it as a practical, competitive baseline for modern Bayesian bandits. The work provides rigorous empirical validation of FG-TS’s trade-offs across model classes and inference approximations, clarifying its applicability boundaries.

Technology Category

Application Category

📝 Abstract
Thompson Sampling (TS) is widely used to address the exploration/exploitation tradeoff in contextual bandits, yet recent theory shows that it does not explore aggressively enough in high-dimensional problems. Feel-Good Thompson Sampling (FG-TS) addresses this by adding an optimism bonus that biases toward high-reward models, and it achieves the asymptotically minimax-optimal regret in the linear setting when posteriors are exact. However, its performance with emph{approximate} posteriors -- common in large-scale or neural problems -- has not been benchmarked. We provide the first systematic study of FG-TS and its smoothed variant (SFG-TS) across eleven real-world and synthetic benchmarks. To evaluate their robustness, we compare performance across settings with exact posteriors (linear and logistic bandits) to approximate regimes produced by fast but coarse stochastic-gradient samplers. Ablations over preconditioning, bonus scale, and prior strength reveal a trade-off: larger bonuses help when posterior samples are accurate, but hurt when sampling noise dominates. FG-TS generally outperforms vanilla TS in linear and logistic bandits, but tends to be weaker in neural bandits. Nevertheless, because FG-TS and its variants are competitive and easy-to-use, we recommend them as baselines in modern contextual-bandit benchmarks. Finally, we provide source code for all our experiments in https://github.com/SarahLiaw/ctx-bandits-mcmc-showdown.
Problem

Research questions and friction points this paper is trying to address.

Evaluates Feel-Good Thompson Sampling in high-dimensional bandits
Compares exact vs approximate posteriors in contextual bandits
Assesses robustness of FG-TS across synthetic and real benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimism bonus enhances exploration in TS
Systematic benchmark of FG-TS with approximate posteriors
Easy-to-use FG-TS variants as modern baselines
🔎 Similar Papers
No similar papers found.
E
Emile Anand
Georgia Institute of Technology School of Computer Science
Sarah Liaw
Sarah Liaw
California Institute of Technology
computational mathematicsstatisticsmachine learning