When and why randomised exploration works (in linear bandits)

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the conditions and mechanisms under which randomized exploration—specifically Thompson sampling—achieves optimal regret bounds in linear bandits. Focusing on $d$-dimensional linear environments with a smooth, strongly convex, compact action set, we establish, for the first time, that Thompson sampling attains the tight regret upper bound $mathcal{O}(dsqrt{n}log n)$ without requiring forced optimism or posterior inflation assumptions. Our analysis integrates Bayesian randomized reasoning, high-dimensional geometric characterization of the action space, and refined statistical inference techniques, revealing that the curvature of the action set positively regulates exploration efficiency. This result not only confirms the optimal dimension scaling of randomized strategies in structured linear bandits but also breaks the theoretical reliance on deterministic or optimism-based mechanisms. It provides a new paradigm for understanding the intrinsic efficacy of Bayesian exploration in sequential decision-making under uncertainty.

Technology Category

Application Category

📝 Abstract
We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(dsqrt{n} log(n))$. Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.
Problem

Research questions and friction points this paper is trying to address.

Analyzes randomized exploration algorithms
Focuses on Thompson sampling in linear bandits
Proves optimal dimension dependence in regret
Innovation

Methods, ideas, or system contributions that make the work stand out.

Randomised exploration without forced optimism
Thompson sampling in linear bandits
Optimal dimension dependence in regret
🔎 Similar Papers
No similar papers found.