🤖 AI Summary
This paper studies the Gaussian process (GP) multi-armed bandit problem, aiming to minimize cumulative regret under an unknown reward function residing in a reproducing kernel Hilbert space (RKHS). We propose a novel algorithmic framework integrating Gaussian process regression with maximum variance reduction (MVR), tailored to three critical settings: noiseless observations, unknown RKHS norm, and nonstationary (time-varying) noise variance. Our key contribution is the first derivation of a tight posterior variance upper bound, enabling noise-variance-dependent optimal regret bounds: near-optimal regret in the noiseless case; full adaptivity to unknown RKHS norm while achieving the minimax-optimal rate; and the first extension of theoretical optimality to kernelized heteroscedastic, time-varying noise regimes. The results unify Bayesian and frequentist perspectives and substantially advance the theoretical frontier of GP bandits.
📝 Abstract
We study the Gaussian process (GP) bandit problem, whose goal is to minimize regret under an unknown reward function lying in some reproducing kernel Hilbert space (RKHS). The maximum posterior variance analysis is vital in analyzing near-optimal GP bandit algorithms such as maximum variance reduction (MVR) and phased elimination (PE). Therefore, we first show the new upper bound of the maximum posterior variance, which improves the dependence of the noise variance parameters of the GP. By leveraging this result, we refine the MVR and PE to obtain (i) a nearly optimal regret upper bound in the noiseless setting and (ii) regret upper bounds that are optimal with respect to the RKHS norm of the reward function. Furthermore, as another application of our proposed bound, we analyze the GP bandit under the time-varying noise variance setting, which is the kernelized extension of the linear bandit with heteroscedastic noise. For this problem, we show that MVR and PE-based algorithms achieve noise variance-dependent regret upper bounds, which matches our regret lower bound.