Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study systematically evaluates Feel-Good Thompson Sampling (FG-TS) and its smoothed variant (SFG-TS) in contextual bandit settings—linear, logistic, and neural network—under both exact and approximate posterior inference (e.g., MCMC, stochastic gradient Bayesian sampling). To address insufficient exploration in high-dimensional regimes, FG-TS incorporates an optimistic reward mechanism that enhances exploration robustness. This is the first unified benchmarking effort assessing FG-TS variants under both exact and scalable approximate posteriors. Results show that FG-TS significantly outperforms standard Thompson Sampling in linear and logistic bandits, but exhibits diminished gains in neural network settings due to posterior approximation challenges. Empirically, FG-TS achieves strong overall performance with minimal implementation overhead, establishing it as a practical, competitive baseline for modern Bayesian bandits. The work provides rigorous empirical validation of FG-TS’s trade-offs across model classes and inference approximations, clarifying its applicability boundaries.

Technology Category

Application Category

📝 Abstract

Thompson Sampling (TS) is widely used to address the exploration/exploitation tradeoff in contextual bandits, yet recent theory shows that it does not explore aggressively enough in high-dimensional problems. Feel-Good Thompson Sampling (FG-TS) addresses this by adding an optimism bonus that biases toward high-reward models, and it achieves the asymptotically minimax-optimal regret in the linear setting when posteriors are exact. However, its performance with emph{approximate} posteriors -- common in large-scale or neural problems -- has not been benchmarked. We provide the first systematic study of FG-TS and its smoothed variant (SFG-TS) across eleven real-world and synthetic benchmarks. To evaluate their robustness, we compare performance across settings with exact posteriors (linear and logistic bandits) to approximate regimes produced by fast but coarse stochastic-gradient samplers. Ablations over preconditioning, bonus scale, and prior strength reveal a trade-off: larger bonuses help when posterior samples are accurate, but hurt when sampling noise dominates. FG-TS generally outperforms vanilla TS in linear and logistic bandits, but tends to be weaker in neural bandits. Nevertheless, because FG-TS and its variants are competitive and easy-to-use, we recommend them as baselines in modern contextual-bandit benchmarks. Finally, we provide source code for all our experiments in https://github.com/SarahLiaw/ctx-bandits-mcmc-showdown.

Problem

Research questions and friction points this paper is trying to address.

Evaluates Feel-Good Thompson Sampling in high-dimensional bandits

Compares exact vs approximate posteriors in contextual bandits

Assesses robustness of FG-TS across synthetic and real benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimism bonus enhances exploration in TS

Systematic benchmark of FG-TS with approximate posteriors

Easy-to-use FG-TS variants as modern baselines

🔎 Similar Papers

Diffusion Models Meet Contextual Bandits with Large Action Spaces