Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This paper studies Bayesian optimization with human feedback (BOHF): efficiently identifying the global optimum action under costly pairwise binary preference queries (e.g., “Action A is preferred to Action B”). Methodologically, it employs a Gaussian process prior coupled with the Bradley–Terry–Luce (BTL) preference model, and integrates information gain analysis with an upper-confidence-bound acquisition strategy. Theoretically, it establishes the first tight regret bound of $ ilde{mathcal{O}}(sqrt{Gamma(T)T})$, proving that preference feedback achieves sample efficiency comparable to scalar-valued feedback; under standard kernels, it recovers the order-optimal complexity of classical BO—significantly improving upon prior bounds. Empirically, the method attains near-optimal performance using only a small number of human preference queries.

Technology Category

Application Category

📝 Abstract

Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $ ilde{mathcal{O}}(sqrt{Gamma(T)T})$, where $Gamma(T)$ represents the maximum information gain$unicode{x2014}$a kernel-specific complexity term$unicode{x2014}$and $T$ is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BO$unicode{x2014}$achieved with richer feedback models$unicode{x2014}$are recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.

Problem

Research questions and friction points this paper is trying to address.

Optimizing actions using limited human preference feedback

Improving regret bounds for Bayesian optimization algorithms

Achieving near-optimal performance with preferential samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian Optimization with human feedback

Tighter regret bounds using BTL model

Optimal sample complexities for common kernels

🔎 Similar Papers

No similar papers found.