Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This paper studies the Gaussian process (GP) multi-armed bandit problem, aiming to minimize cumulative regret under an unknown reward function residing in a reproducing kernel Hilbert space (RKHS). We propose a novel algorithmic framework integrating Gaussian process regression with maximum variance reduction (MVR), tailored to three critical settings: noiseless observations, unknown RKHS norm, and nonstationary (time-varying) noise variance. Our key contribution is the first derivation of a tight posterior variance upper bound, enabling noise-variance-dependent optimal regret bounds: near-optimal regret in the noiseless case; full adaptivity to unknown RKHS norm while achieving the minimax-optimal rate; and the first extension of theoretical optimality to kernelized heteroscedastic, time-varying noise regimes. The results unify Bayesian and frequentist perspectives and substantially advance the theoretical frontier of GP bandits.

Technology Category

Application Category

📝 Abstract

We study the Gaussian process (GP) bandit problem, whose goal is to minimize regret under an unknown reward function lying in some reproducing kernel Hilbert space (RKHS). The maximum posterior variance analysis is vital in analyzing near-optimal GP bandit algorithms such as maximum variance reduction (MVR) and phased elimination (PE). Therefore, we first show the new upper bound of the maximum posterior variance, which improves the dependence of the noise variance parameters of the GP. By leveraging this result, we refine the MVR and PE to obtain (i) a nearly optimal regret upper bound in the noiseless setting and (ii) regret upper bounds that are optimal with respect to the RKHS norm of the reward function. Furthermore, as another application of our proposed bound, we analyze the GP bandit under the time-varying noise variance setting, which is the kernelized extension of the linear bandit with heteroscedastic noise. For this problem, we show that MVR and PE-based algorithms achieve noise variance-dependent regret upper bounds, which matches our regret lower bound.

Problem

Research questions and friction points this paper is trying to address.

Optimize Gaussian process bandit regret analysis.

Improve bounds for noiseless reward settings.

Address non-stationary variance in bandit algorithms.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved maximum posterior variance analysis

Refined MVR and PE algorithms

Time-varying noise variance setting

🔎 Similar Papers

No similar papers found.