Near-Optimal Regret in Adversarial Kernel Bandits

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

266K/year

🤖 AI Summary

This work addresses regret minimization in adversarial kernel bandits, where losses are induced by arbitrary bounded functions in a reproducing kernel Hilbert space (RKHS). The authors propose an exponential weights algorithm that combines regularized importance-weighted loss estimation with explicit bias correction. Notably, this approach achieves the first near-optimal regret bound matching that of the stochastic setting without relying on the rank-one adversary assumption. By leveraging RKHS theory, effective dimension analysis, and a carefully designed regularization scheme, the resulting regret bound is Õ(√(T·d_*(λ)·log|X|)). For Matérn kernels, this yields a rate of Õ(T^{(ν+d)/(2ν+d)}), which aligns with the optimal rate known in stochastic environments.

📝 Abstract

We study the adversarial kernel bandit problem, in which the loss at each round is induced by an arbitrary bounded element of a reproducing kernel Hilbert space (RKHS). We propose an exponential-weights algorithm built on a regularized importance-weighted loss estimator, together with an explicit correction term that cancels the bias introduced by the regularization. Our main result bounds the regret by $\widetilde{O}\big(\sqrt{T\, d_*(λ)\,\log|{X}|}\big)$, where $d_*(λ)$ is a widely-adopted notion of effective dimension that captures the complexity of the kernel. Up to logarithmic factors, this matches the known rate achieved in the related stochastic kernel bandit problem. A notable application is the Matérn$(ν,d)$ kernel with smoothness parameter $ν$ on $\mathbb{R}^d$, for which our bound specializes to $\widetilde{O}\big(T^{(ν+d)/(2ν+d)}\big)$, improving over the best-known prior rate of Chatterji et al. [2019] while simultaneously removing the rank-one adversary assumption required by their analysis. Moreover, this rate is the same as the known optimal rate for stochastic kernel bandits, and also matches a lower bound from concurrent work up to a $\log T$ factor.

Problem

Research questions and friction points this paper is trying to address.

adversarial kernel bandits

regret minimization

reproducing kernel Hilbert space

effective dimension

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial kernel bandits

exponential-weights algorithm

regularized importance-weighted estimator

effective dimension