Self-Concordant Perturbations for Linear Bandits

πŸ“… 2025-10-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper studies regret minimization in adversarial linear bandits over the $d$-dimensional hypercube and Euclidean ball. We propose a unified algorithmic framework that, for the first time, integrates self-concordant perturbations into the Follow-the-Perturbed-Leader (FTPL) paradigm, thereby bridging the theoretical gap between FTPL and Follow-the-Regularized-Leader (FTRL) in the partial-information setting. Our design enables efficient stochastic exploration without projections or second-order optimization. On the hypercube, we achieve a regret bound of $O(dsqrt{n}log n)$, improving upon prior FTPL approaches by a $sqrt{d}$ factor; on the Euclidean ball, we attain the same bound, matching the optimal rate of FTRL. The key innovation lies in the theoretical development and application of self-concordant perturbations within FTPL, yielding a new paradigm for linear bandits that is both conceptually simple and statistically optimal.

Technology Category

Application Category

πŸ“ Abstract
We study the adversarial linear bandits problem and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of $O(dsqrt{n ln n})$ on both the $d$-dimensional hypercube and the Euclidean ball. On the Euclidean ball, this matches the rate attained by existing self-concordant FTRL methods. For the hypercube, this represents a $sqrt{d}$ improvement over these methods and matches the optimal bound up to logarithmic factors.
Problem

Research questions and friction points this paper is trying to address.

Unifying FTRL and FTPL methods for adversarial linear bandits
Introducing self-concordant perturbations to mirror FTRL barriers
Achieving improved regret bounds on hypercube and Euclidean ball
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework bridging FTRL and FTPL methods
Introduced self-concordant perturbations for stochastic exploration
Combined self-concordant regularization with efficient exploration
πŸ”Ž Similar Papers
No similar papers found.
L
Lucas LΓ©vy
University of Oxford, United Kingdom; Γ‰cole Polytechnique, IP Paris, France
J
Jean-Lou Valeau
University of Oxford, United Kingdom; ENSAE, IP Paris, France
Arya Akhavan
Arya Akhavan
University of Oxford
P
Patrick Rebeschini
University of Oxford, United Kingdom