Prior Diffusiveness and Regret in the Linear-Gaussian Bandit

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the long-standing challenge of characterizing the Bayesian regret of Thompson sampling in linear Gaussian bandits, where prior specification and long-term performance are intricately coupled. We establish, for the first time, that the regret decomposes additively into a “warm-up” term—dependent on the prior covariance $\Sigma_0$—and a minimax-optimal long-term term of order $\tilde{O}(\sigma d \sqrt{T})$, rather than multiplicatively as previously conjectured. To this end, we introduce a novel elliptical potential lemma and prove a lower bound showing that the warm-up term $d r \sqrt{\mathrm{Tr}(\Sigma_0)}$ is unavoidable. Consequently, we obtain a total regret upper bound of $\tilde{O}(\sigma d \sqrt{T} + d r \sqrt{\mathrm{Tr}(\Sigma_0)})$, which significantly improves upon existing results.

Technology Category

Application Category

📝 Abstract
We prove that Thompson sampling exhibits $\tilde{O}(\sigma d \sqrt{T} + d r \sqrt{\mathrm{Tr}(\Sigma_0)})$ Bayesian regret in the linear-Gaussian bandit with a $\mathcal{N}(\mu_0, \Sigma_0)$ prior distribution on the coefficients, where $d$ is the dimension, $T$ is the time horizon, $r$ is the maximum $\ell_2$ norm of the actions, and $\sigma^2$ is the noise variance. In contrast to existing regret bounds, this shows that to within logarithmic factors, the prior-dependent ``burn-in''term $d r \sqrt{\mathrm{Tr}(\Sigma_0)}$ decouples additively from the minimax (long run) regret $\sigma d \sqrt{T}$. Previous regret bounds exhibit a multiplicative dependence on these terms. We establish these results via a new ``elliptical potential''lemma, and also provide a lower bound indicating that the burn-in term is unavoidable.
Problem

Research questions and friction points this paper is trying to address.

linear-Gaussian bandit
Thompson sampling
Bayesian regret
prior dependence
burn-in term
Innovation

Methods, ideas, or system contributions that make the work stand out.

Thompson sampling
Bayesian regret
linear-Gaussian bandit
elliptical potential
prior-dependent regret
🔎 Similar Papers
No similar papers found.