A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work proposes the Conditionally Coupled Contextual (C3) Thompson Sampling algorithm to address a non-stationary contextual bandit setting commonly encountered in recommendation systems, characterized by dense arm features, nonlinear reward functions, and time-varying contexts that preserve temporal correlations. C3 is the first method to jointly model these challenges within a unified framework, combining an enhanced Nadaraya–Watson estimator in the embedding space with Thompson sampling to enable efficient online learning without frequent model retraining. Empirical evaluations demonstrate that C3 reduces average cumulative regret by 5.7% across four OpenML tabular datasets and achieves a 12.4% improvement in click-through rate on the MIND news recommendation benchmark, highlighting its effectiveness in real-world dynamic environments.

Technology Category

Application Category

📝 Abstract

Contextual bandits are incredibly useful in many practical problems. We go one step further by devising a more realistic problem that combines: (1) contextual bandits with dense arm features, (2) non-linear reward functions, and (3) a generalization of correlated bandits where reward distributions change over time but the degree of correlation maintains. This formulation lends itself to a wider set of applications such as recommendation tasks. To solve this problem, we introduce conditionally coupled contextual C3 Thompson sampling for Bernoulli bandits. It combines an improved Nadaraya-Watson estimator on an embedding space with Thompson sampling that allows online learning without retraining. Empirical results show that C3 outperforms the next best algorithm by 5.7% lower average cumulative regret on four OpenML tabular datasets as well as demonstrating a 12.4% click lift on Microsoft News Dataset (MIND) compared to other algorithms.

Problem

Research questions and friction points this paper is trying to address.

contextual bandits

non-stationary rewards

feature-rich arms

correlated bandits

non-linear reward functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

contextual bandits

non-stationary rewards

Thompson sampling