🤖 AI Summary
This paper investigates the adversarial performance of Thompson sampling in full-feedback online learning (i.e., prediction with expert advice), particularly when the number of experts may be infinite. To overcome the limitation of conventional Bayesian priors defined over the expert set, the authors introduce a novel formulation where the prior is placed directly over the adversary’s action space—a first in this setting—and propose an “excess regret” decomposition framework that unifies analysis for both finite and infinite action spaces. Theoretically, for adversaries that are β-bounded and λ-Lipschitz continuous, Thompson sampling with a Gaussian process prior achieves an $O(eta sqrt{T log(1+lambda)})$ regret bound; in the finite-expert case, it recovers the optimal rate. This work provides the first practically implementable Bayesian online learning algorithm with provable performance guarantees for infinite expert sets.
📝 Abstract
We develop an analysis of Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling with a certain Gaussian process prior widely-used in the Bayesian optimization literature has a $mathcal{O}(etasqrt{Tlog(1+lambda)})$ rate against a $eta$-bounded $lambda$-Lipschitz~adversary.