Efficient kernelized bandit algorithms via exploration distributions

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

This paper studies kernelized bandits in a reproducing kernel Hilbert space (RKHS), where the reward function has bounded RKHS norm and the action set is a compact subset of ℝᵈ. To address the exploration–exploitation trade-off, we propose GP-Generic, a unified algorithmic framework whose key innovation is the introduction of an “exploration distribution” that subsumes both upper-confidence-bound (UCB) and randomized strategies. GP-Generic achieves the theoretically optimal cumulative regret bound 𝒪(γₜ√T), where γₜ denotes the maximum information gain. The framework is agnostic to kernel choice and sampling mechanism, and attains asymptotically matching regret bounds to those of UCB and Thompson Sampling under mild conditions. Empirical results demonstrate that properly designed stochastic exploration distributions significantly outperform deterministic policies, achieving both theoretical optimality and practical efficiency.

Technology Category

Application Category

📝 Abstract

We consider a kernelized bandit problem with a compact arm set ${X} subset mathbb{R}^d $ and a fixed but unknown reward function $f^*$ with a finite norm in some Reproducing Kernel Hilbert Space (RKHS). We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic, based on a novel concept: exploration distributions. This class of algorithms includes Upper Confidence Bound-based approaches as a special case, but also allows for a variety of randomized algorithms. With careful choice of exploration distribution, our proposed generic algorithm realizes a wide range of concrete algorithms that achieve $ ilde{O}(gamma_Tsqrt{T})$ regret bounds, where $gamma_T$ characterizes the RKHS complexity. This matches known results for UCB- and Thompson Sampling-based algorithms; we also show that in practice, randomization can yield better practical results.

Problem

Research questions and friction points this paper is trying to address.

Efficient algorithms for kernelized bandit problems

Optimizing regret bounds in RKHS settings

Exploring randomized algorithms for better performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernelized bandit algorithms via exploration distributions

GP-Generic class includes UCB and randomized approaches

Achieves Õ(γ_T√T) regret bounds with RKHS complexity

🔎 Similar Papers

Diffusion Models Meet Contextual Bandits with Large Action Spaces