🤖 AI Summary
This work addresses regret minimization in adversarial kernel bandits, where losses are induced by arbitrary bounded functions in a reproducing kernel Hilbert space (RKHS). The authors propose an exponential weights algorithm that combines regularized importance-weighted loss estimation with explicit bias correction. Notably, this approach achieves the first near-optimal regret bound matching that of the stochastic setting without relying on the rank-one adversary assumption. By leveraging RKHS theory, effective dimension analysis, and a carefully designed regularization scheme, the resulting regret bound is Õ(√(T·d_*(λ)·log|X|)). For Matérn kernels, this yields a rate of Õ(T^{(ν+d)/(2ν+d)}), which aligns with the optimal rate known in stochastic environments.
📝 Abstract
We study the adversarial kernel bandit problem, in which the loss at each round is induced by an arbitrary bounded element of a reproducing kernel Hilbert space (RKHS). We propose an exponential-weights algorithm built on a regularized importance-weighted loss estimator, together with an explicit correction term that cancels the bias introduced by the regularization. Our main result bounds the regret by $\widetilde{O}\big(\sqrt{T\, d_*(λ)\,\log|{X}|}\big)$, where $d_*(λ)$ is a widely-adopted notion of effective dimension that captures the complexity of the kernel. Up to logarithmic factors, this matches the known rate achieved in the related stochastic kernel bandit problem. A notable application is the Matérn$(ν,d)$ kernel with smoothness parameter $ν$ on $\mathbb{R}^d$, for which our bound specializes to $\widetilde{O}\big(T^{(ν+d)/(2ν+d)}\big)$, improving over the best-known prior rate of Chatterji et al. [2019] while simultaneously removing the rank-one adversary assumption required by their analysis. Moreover, this rate is the same as the known optimal rate for stochastic kernel bandits, and also matches a lower bound from concurrent work up to a $\log T$ factor.