🤖 AI Summary
We address the no-regret optimization of time-varying black-box functions under pure bandit feedback. We first show that standard GP-bandit algorithms fail to achieve no-regret in dynamic environments. To overcome this, we propose W-SparQ-GP-UCB—the first algorithm to incorporate uncertainty injection into time-varying Gaussian process (GP) optimization. It jointly employs heteroscedastic GP modeling, sparse inference, and RKHS norm constraints to explicitly characterize function non-stationarity. By re-querying historical points and adaptively updating the model, it attains no-regret with only asymptotically minimal additional queries. Theoretically, we establish the first lower bound on the minimum extra query overhead required for no-regret, quantify the fundamental trade-off between temporal variation rate and achievable regret rate, and provide matching upper and lower bounds—thereby fully characterizing the statistical limits of no-regret learning for time-varying GPs under bandit feedback.
📝 Abstract
Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and often unrealistic assumptions are imposed.
In this article, we propose a novel method to optimize time-varying rewards in the frequentist setting, where the objective has bounded RKHS norm. Time variations are captured through uncertainty injection (UI), which enables heteroscedastic GP regression that adapts past observations to the current time step. As no-regret is unattainable in general in the strict bandit setting, we relax the latter allowing additional queries on previously observed points. Building on sparse inference and the effect of UI on regret, we propose extbf{W-SparQ-GP-UCB}, an online algorithm that achieves no-regret with only a vanishing number of additional queries per iteration. To assess the theoretical limits of this approach, we establish a lower bound on the number of additional queries required for no-regret, proving the efficiency of our method. Finally, we provide a comprehensive analysis linking the degree of time-variation of the function to achievable regret rates, together with upper and lower bounds on the number of additional queries needed in each regime.