A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for $m$-Set Semi-Bandit Problem

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the $m$-set semi-bandit problem by proposing a Follow-the-Perturbed-Leader (FTPL) algorithm that incorporates Fréchet and Pareto perturbations, achieving for the first time Best-of-Both-Worlds optimality across both adversarial and stochastic environments. Specifically, the algorithm attains the optimal $O(\sqrt{mdT})$ regret bound in adversarial settings and logarithmic regret in stochastic settings. To enhance computational efficiency without compromising theoretical guarantees, the authors introduce a conditional geometric resampling technique that reduces the computational complexity of loss estimation from $O(d^2)$ to $O(md(\log(d/m)+1))$, thereby effectively balancing statistical performance and practical tractability.

Technology Category

Application Category

📝 Abstract
This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in $m$-set semi-bandit problems. FTPL has been studied extensively as a promising candidate of an efficient algorithm with favorable regret for adversarial combinatorial semi-bandits. Nevertheless, the optimality of FTPL has still been unknown unlike Follow-the-Regularized-Leader (FTRL) whose optimality has been proved for various tasks of online learning. In this paper, we extend the analysis of FTPL with geometric resampling (GR) to $m$-set semi-bandits, which is a special case of combinatorial semi-bandits, showing that FTPL with Fréchet and Pareto distributions with certain parameters achieves the best possible regret of $O(\sqrt{mdT})$ in adversarial setting. We also show that FTPL with Fréchet and Pareto distributions with a certain parameter achieves a logarithmic regret for stochastic setting, meaning the Best-of-Both-Worlds optimality of FTPL for $m$-set semi-bandit problems. Furthermore, we extend the conditional geometric resampling to $m$-set semi-bandits for efficient loss estimation in FTPL, reducing the computational complexity from $O(d^2)$ of the original geometric resampling to $O(md(\log(d/m)+1))$ without sacrificing the regret performance.
Problem

Research questions and friction points this paper is trying to address.

m-set semi-bandit
FTPL
best-of-both-worlds
regret optimality
computational complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

FTPL
Best-of-Both-Worlds
m-set semi-bandit
geometric resampling
regret optimality
🔎 Similar Papers
No similar papers found.
B
Botao Chen
Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, Japan
J
Jongyeong Lee
Computational Science Research Center, Korea Institute of Science and Technology, Seoul, Korea
C
Chansoo Kim
Computational Science Research Center, Korea Institute of Science and Technology & University of Science and Technology, Seoul, Korea
Junya Honda
Junya Honda
Kyoto University / RIKEN
Information TheoryMachine Learning