An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This paper investigates the adversarial performance of Thompson sampling in full-feedback online learning (i.e., prediction with expert advice), particularly when the number of experts may be infinite. To overcome the limitation of conventional Bayesian priors defined over the expert set, the authors introduce a novel formulation where the prior is placed directly over the adversary’s action space—a first in this setting—and propose an “excess regret” decomposition framework that unifies analysis for both finite and infinite action spaces. Theoretically, for adversaries that are β-bounded and λ-Lipschitz continuous, Thompson sampling with a Gaussian process prior achieves an $O(eta sqrt{T log(1+lambda)})$ regret bound; in the finite-expert case, it recovers the optimal rate. This work provides the first practically implementable Bayesian online learning algorithm with provable performance guarantees for infinite expert sets.

Technology Category

Application Category

📝 Abstract

We develop an analysis of Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling with a certain Gaussian process prior widely-used in the Bayesian optimization literature has a $mathcal{O}(etasqrt{Tlog(1+lambda)})$ rate against a $eta$-bounded $lambda$-Lipschitz~adversary.

Problem

Research questions and friction points this paper is trying to address.

Thompson Sampling analysis

full-information online learning

infinite action spaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

Thompson Sampling analysis

Gaussian process prior

Regret decomposition technique

🔎 Similar Papers

No similar papers found.