A Perturbation Approach to Unconstrained Linear Bandits

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates static and dynamic regret minimization for unconstrained linear bandits (uBLO) in adversarial environments. By introducing a perturbation mechanism, the uBLO problem is reduced to a standard online linear optimization (OLO) setting, enabling the integration of comparator-adaptive OLO algorithms. The proposed approach achieves, for the first time without any prior knowledge, high-probability optimal guarantees for both static and dynamic regret: the dynamic regret scales as $\tilde{O}(\sqrt{P_T})$ with respect to the path-length $P_T$, and an $\Omega(\sqrt{dT})$ lower bound is established for adversarial linear bandits over the unit Euclidean ball. This study presents the first high-probability bounds simultaneously covering static and dynamic regret, while also improving upon existing expected regret analyses.
📝 Abstract
We revisit the standard perturbation-based approach of Abernethy et al. (2008) in the context of unconstrained Bandit Linear Optimization (uBLO). We show the surprising result that in the unconstrained setting, this approach effectively reduces Bandit Linear Optimization (BLO) to a standard Online Linear Optimization (OLO) problem. Our framework improves on prior work in several ways. First, we derive expected-regret guarantees when our perturbation scheme is combined with comparator-adaptive OLO algorithms, leading to new insights about the impact of different adversarial models on the resulting comparator-adaptive rates. We also extend our analysis to dynamic regret, obtaining the optimal $\sqrt{P_T}$ path-length dependencies without prior knowledge of $P_T$. We then develop the first high-probability guarantees for both static and dynamic regret in uBLO. Finally, we discuss lower bounds on the static regret, and prove the folklore $Ω(\sqrt{dT})$ rate for adversarial linear bandits on the unit Euclidean ball, which is of independent interest.
Problem

Research questions and friction points this paper is trying to address.

unconstrained Bandit Linear Optimization
regret minimization
online linear optimization
adversarial bandits
dynamic regret
Innovation

Methods, ideas, or system contributions that make the work stand out.

perturbation-based approach
comparator-adaptive
dynamic regret
high-probability guarantees
linear bandits
🔎 Similar Papers
No similar papers found.
A
Andrew Jacobsen
Università degli Studi di Milano, Politecnico di Milano
D
Dorian Baudry
Inria, Univ. Grenoble Alpes, Grenoble INP, CNRS, LIG, 38000 Grenoble, France
Shinji Ito
Shinji Ito
The University of Tokyo
Nicolò Cesa-Bianchi
Nicolò Cesa-Bianchi
Professor of Computer Science, Università degli Studi di Milano and Politecnico di Milano
Machine LearningLearning TheoryOnline LearningMulti-Armed Bandits