🤖 AI Summary
This work investigates black-box optimization in a reproducing kernel Hilbert space (RKHS) under batched noisy feedback, covering both non-robust and adversarially robust settings. By precisely characterizing the optimal batch size—including constant factors—designing an adaptive batching schedule, and employing a minimax regret analysis, the study eliminates the extraneous factor $B$ present in existing regret bounds and establishes algorithm-independent lower bounds. The proposed robust-BPE algorithm is the first to achieve a tight cumulative regret bound in the robust setting. In the non-robust case, it attains a near-optimal regret bound, and both adaptive and fixed batching strategies are shown to share the same minimax regret rate.
📝 Abstract
In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We refer to this as the Batched Kernelized Bandits problem, and refine and extend existing results on regret bounds. For algorithmic upper bounds, (Li and Scarlett, 2022) shows that $B=O(\log\log T)$ batches suffice to attain near-optimal regret, where $T$ is the time horizon and $B$ is the number of batches. We further refine this by (i) finding the optimal number of batches including constant factors (to within $1+o(1)$), and (ii) removing a factor of $B$ in the regret bound. For algorithm-independent lower bounds, noticing that existing results only apply when the batch sizes are fixed in advance, we present novel lower bounds when the batch sizes are chosen adaptively, and show that adaptive batches have essentially same minimax regret scaling as fixed batches. Furthermore, we consider a robust setting where the goal is to choose points for which the function value remains high even after an adversarial perturbation. We present the robust-BPE algorithm, and show that a suitably-defined cumulative regret notion incurs the same bound as the non-robust setting, and derive a simple regret bound significantly below that of previous work.