Nonparametric Bayesian Optimization for General Rewards

๐Ÿ“… 2026-02-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the lack of a general, theoretically grounded Bayesian optimization framework under uncertain reward modelsโ€”such as non-stationary, heavy-tailed, or ill-conditioned noiseโ€”by proposing a nonparametric Bayesian optimization approach based on the infinite Gaussian process (โˆž-GP) coupled with Thompson sampling. Under minimal assumptions of Lipschitz continuity of the objective function and broad noise conditions, the method achieves no-regret optimization. Key contributions include establishing the first no-regret guarantee in a general reward setting, introducing โˆž-GP as a prior over the space of reward distributions to significantly broaden the class of modelable rewards, and developing a novel regret analysis framework for Thompson sampling based on total variation distance. Empirical results demonstrate state-of-the-art performance in complex reward environments with computational overhead comparable to classical Gaussian processes and strong scalability.

Technology Category

Application Category

๐Ÿ“ Abstract
This work focuses on Bayesian optimization (BO) under reward model uncertainty. We propose the first BO algorithm that achieves no-regret guarantee in a general reward setting, requiring only Lipschitz continuity of the objective function and accommodating a broad class of measurement noise. The core of our approach is a novel surrogate model, termed as infinite Gaussian process ($\infty$-GP). It is a Bayesian nonparametric model that places a prior on the space of reward distributions, enabling it to represent a substantially broader class of reward models than classical Gaussian process (GP). The $\infty$-GP is used in combination with Thompson Sampling (TS) to enable effective exploration and exploitation. Correspondingly, we develop a new TS regret analysis framework for general rewards, which relates the regret to the total variation distance between the surrogate model and the true reward distribution. Furthermore, with a truncated Gibbs sampling procedure, our method is computationally scalable, incurring minimal additional memory and computational complexities compared to classical GP. Empirical results demonstrate state-of-the-art performance, particularly in settings with non-stationary, heavy-tailed, or other ill-conditioned rewards.
Problem

Research questions and friction points this paper is trying to address.

Bayesian optimization
reward uncertainty
nonparametric
general rewards
no-regret
Innovation

Methods, ideas, or system contributions that make the work stand out.

infinite Gaussian process
Bayesian nonparametrics
Thompson Sampling
general reward
no-regret optimization
๐Ÿ”Ž Similar Papers
No similar papers found.