π€ AI Summary
This paper studies regret minimization in adversarial linear bandits over the $d$-dimensional hypercube and Euclidean ball. We propose a unified algorithmic framework that, for the first time, integrates self-concordant perturbations into the Follow-the-Perturbed-Leader (FTPL) paradigm, thereby bridging the theoretical gap between FTPL and Follow-the-Regularized-Leader (FTRL) in the partial-information setting. Our design enables efficient stochastic exploration without projections or second-order optimization. On the hypercube, we achieve a regret bound of $O(dsqrt{n}log n)$, improving upon prior FTPL approaches by a $sqrt{d}$ factor; on the Euclidean ball, we attain the same bound, matching the optimal rate of FTRL. The key innovation lies in the theoretical development and application of self-concordant perturbations within FTPL, yielding a new paradigm for linear bandits that is both conceptually simple and statistically optimal.
π Abstract
We study the adversarial linear bandits problem and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of $O(dsqrt{n ln n})$ on both the $d$-dimensional hypercube and the Euclidean ball. On the Euclidean ball, this matches the rate attained by existing self-concordant FTRL methods. For the hypercube, this represents a $sqrt{d}$ improvement over these methods and matches the optimal bound up to logarithmic factors.