Counterfactual Inference under Thompson Sampling

📅 2025-04-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In Thompson sampling–based recommender systems, a key bottleneck for counterfactual inference is the inaccessibility of action propensities—the probability of selecting an arm given observed context. Method: This paper derives, for the first time, closed-form, efficiently computable propensity expressions for Thompson sampling under general parametric models and outcome distributions. Leveraging this theoretical breakthrough, we develop a log-free offline policy evaluation framework that yields unbiased estimates without requiring behavioral policy logs. Contribution/Results: (1) We establish the first rigorous theoretical bridge between Thompson sampling and counterfactual estimation; (2) we provide the first general, analytical solution for propensities under Thompson sampling; (3) our method significantly enhances the reliability and practicality of Bayesian online decision-making evaluation. Empirical validation confirms effectiveness across recommender systems and online advertising domains.

Technology Category

Application Category

📝 Abstract

Recommender systems exemplify sequential decision-making under uncertainty, strategically deciding what content to serve to users, to optimise a range of potential objectives. To balance the explore-exploit trade-off successfully, Thompson sampling provides a natural and widespread paradigm to probabilistically select which action to take. Questions of causal and counterfactual inference, which underpin use-cases like offline evaluation, are not straightforward to answer in these contexts. Specifically, whilst most existing estimators rely on action propensities, these are not readily available under Thompson sampling procedures. We derive exact and efficiently computable expressions for action propensities under a variety of parameter and outcome distributions, enabling the use of off-policy estimators in Thompson sampling scenarios. This opens up a range of practical use-cases where counterfactual inference is crucial, including unbiased offline evaluation of recommender systems, as well as general applications of causal inference in online advertising, personalisation, and beyond.

Problem

Research questions and friction points this paper is trying to address.

Estimating action propensities under Thompson sampling

Enabling counterfactual inference in recommender systems

Facilitating unbiased offline evaluation of algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exact expressions for action propensities under Thompson sampling

Enables off-policy estimators in Thompson sampling scenarios

Facilitates unbiased offline evaluation of recommender systems

🔎 Similar Papers

No similar papers found.

Authors to Follow