Sparse Nonparametric Contextual Bandits

๐Ÿ“… 2025-03-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper studies the sparse nonparametric contextual bandit problem: given an infinite candidate feature set, only a small unknown subset is relevant, and the goal is to simultaneously identify these critical features and minimize cumulative regret. We first establish the minimax regret lower bound for this setting. Then, we propose an enhanced Feel-Good Thompson Sampling algorithm whose regret upper bound matches the lower bound up to logarithmic factors, achieving optimal regret control. Our approach integrates nonparametric function modeling, sparse feature selection, kernel methods, and generalization analysis of neural networks; the derived regret upper bound scales polynomially in the number of actions and logarithmically in the number of effective candidate features. Experiments demonstrate that incorporating sparsity significantly improves the long-term performance of both kernel-based and neural-network-based contextual bandits.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper studies the problem of simultaneously learning relevant features and minimising regret in contextual bandit problems. We introduce and analyse a new class of contextual bandit problems, called sparse nonparametric contextual bandits, in which the expected reward function lies in the linear span of a small unknown set of features that belongs to a known infinite set of candidate features. We consider two notions of sparsity, for which the set of candidate features is either countable or uncountable. Our contribution is two-fold. First, we provide lower bounds on the minimax regret, which show that polynomial dependence on the number of actions is generally unavoidable in this setting. Second, we show that a variant of the Feel-Good Thompson Sampling algorithm enjoys regret bounds that match our lower bounds up to logarithmic factors of the horizon, and have logarithmic dependence on the effective number of candidate features. When we apply our results to kernelised and neural contextual bandits, we find that sparsity always enables better regret bounds, as long as the horizon is large enough relative to the sparsity and the number of actions.
Problem

Research questions and friction points this paper is trying to address.

Learning relevant features in contextual bandits
Minimizing regret with sparse nonparametric models
Analyzing regret bounds for kernel and neural bandits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse nonparametric contextual bandits introduced
Feel-Good Thompson Sampling variant analyzed
Regret bounds improved with sparsity conditions
๐Ÿ”Ž Similar Papers
No similar papers found.