Self-Concordant Perturbations for Linear Bandits

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This paper studies regret minimization in adversarial linear bandits over the $d$-dimensional hypercube and Euclidean ball. We propose a unified algorithmic framework that, for the first time, integrates self-concordant perturbations into the Follow-the-Perturbed-Leader (FTPL) paradigm, thereby bridging the theoretical gap between FTPL and Follow-the-Regularized-Leader (FTRL) in the partial-information setting. Our design enables efficient stochastic exploration without projections or second-order optimization. On the hypercube, we achieve a regret bound of $O(dsqrt{n}log n)$, improving upon prior FTPL approaches by a $sqrt{d}$ factor; on the Euclidean ball, we attain the same bound, matching the optimal rate of FTRL. The key innovation lies in the theoretical development and application of self-concordant perturbations within FTPL, yielding a new paradigm for linear bandits that is both conceptually simple and statistically optimal.

Technology Category

Application Category

📝 Abstract

We study the adversarial linear bandits problem and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of $O(dsqrt{n ln n})$ on both the $d$-dimensional hypercube and the Euclidean ball. On the Euclidean ball, this matches the rate attained by existing self-concordant FTRL methods. For the hypercube, this represents a $sqrt{d}$ improvement over these methods and matches the optimal bound up to logarithmic factors.

Problem

Research questions and friction points this paper is trying to address.

Unifying FTRL and FTPL methods for adversarial linear bandits

Introducing self-concordant perturbations to mirror FTRL barriers

Achieving improved regret bounds on hypercube and Euclidean ball

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework bridging FTRL and FTPL methods

Introduced self-concordant perturbations for stochastic exploration

Combined self-concordant regularization with efficient exploration

🔎 Similar Papers

Boosting Perturbed Gradient Ascent for Last-Iterate Convergence in Games