Generalized Kernelized Bandits: Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds

📅 2025-08-03

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This paper studies regret minimization in generalized kernelized bandits (GKBs): optimizing an unknown function $f^*$ residing in a reproducing kernel Hilbert space (RKHS), where observations follow an exponential-family distribution with mean $mu(f^*)$, thereby unifying kernelized bandits (KBs) and generalized linear bandits (GLBs). We develop a novel self-normalized Bernstein-type inequality—integrating Freedman’s inequality with stitching—to establish a unified analytical framework accommodating both RKHS function spaces and exponential-family noise. Based on this, we design the optimistic algorithm GKB-UCB. Theoretically, we derive a regret bound of $widetilde{O}(gamma_T sqrt{T / kappa_*})$, matching the optimal rates of both KBs and GLBs. This is the first unified optimal analysis for bandits under nonlinear mean mappings, high-dimensional RKHS function classes, and heterogeneous exponential-family noise.

Technology Category

Application Category

📝 Abstract

We study the regret minimization problem in the novel setting of generalized kernelized bandits (GKBs), where we optimize an unknown function $f^*$ belonging to a reproducing kernel Hilbert space (RKHS) having access to samples generated by an exponential family (EF) noise model whose mean is a non-linear function $μ(f^*)$. This model extends both kernelized bandits (KBs) and generalized linear bandits (GLBs). We propose an optimistic algorithm, GKB-UCB, and we explain why existing self-normalized concentration inequalities do not allow to provide tight regret guarantees. For this reason, we devise a novel self-normalized Bernstein-like dimension-free inequality resorting to Freedman's inequality and a stitching argument, which represents a contribution of independent interest. Based on it, we conduct a regret analysis of GKB-UCB, deriving a regret bound of order $widetilde{O}( γ_T sqrt{T/κ_*})$, being $T$ the learning horizon, $γ_T$ the maximal information gain, and $κ_*$ a term characterizing the magnitude the reward nonlinearity. Our result matches, up to multiplicative constants and logarithmic terms, the state-of-the-art bounds for both KBs and GLBs and provides a unified view of both settings.

Problem

Research questions and friction points this paper is trying to address.

Regret minimization in generalized kernelized bandits

Optimizing unknown RKHS functions with exponential noise

Developing tight regret bounds for non-linear reward models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes GKB-UCB algorithm for generalized kernelized bandits

Introduces self-normalized Bernstein-like inequality

Achieves unified regret bound for KBs and GLBs

🔎 Similar Papers

No similar papers found.