Efficient kernelized bandit algorithms via exploration distributions

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies kernelized bandits in a reproducing kernel Hilbert space (RKHS), where the reward function has bounded RKHS norm and the action set is a compact subset of ℝᵈ. To address the exploration–exploitation trade-off, we propose GP-Generic, a unified algorithmic framework whose key innovation is the introduction of an “exploration distribution” that subsumes both upper-confidence-bound (UCB) and randomized strategies. GP-Generic achieves the theoretically optimal cumulative regret bound 𝒪(γₜ√T), where γₜ denotes the maximum information gain. The framework is agnostic to kernel choice and sampling mechanism, and attains asymptotically matching regret bounds to those of UCB and Thompson Sampling under mild conditions. Empirical results demonstrate that properly designed stochastic exploration distributions significantly outperform deterministic policies, achieving both theoretical optimality and practical efficiency.

Technology Category

Application Category

📝 Abstract
We consider a kernelized bandit problem with a compact arm set ${X} subset mathbb{R}^d $ and a fixed but unknown reward function $f^*$ with a finite norm in some Reproducing Kernel Hilbert Space (RKHS). We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic, based on a novel concept: exploration distributions. This class of algorithms includes Upper Confidence Bound-based approaches as a special case, but also allows for a variety of randomized algorithms. With careful choice of exploration distribution, our proposed generic algorithm realizes a wide range of concrete algorithms that achieve $ ilde{O}(gamma_Tsqrt{T})$ regret bounds, where $gamma_T$ characterizes the RKHS complexity. This matches known results for UCB- and Thompson Sampling-based algorithms; we also show that in practice, randomization can yield better practical results.
Problem

Research questions and friction points this paper is trying to address.

Efficient algorithms for kernelized bandit problems
Optimizing regret bounds in RKHS settings
Exploring randomized algorithms for better performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernelized bandit algorithms via exploration distributions
GP-Generic class includes UCB and randomized approaches
Achieves Õ(γ_T√T) regret bounds with RKHS complexity
🔎 Similar Papers
No similar papers found.