🤖 AI Summary
This paper investigates the optimality and computational complexity of the Follow-the-Perturbed-Leader (FTPL) algorithm in size-invariant combinatorial semi-bandits. Addressing the open question of FTPL’s optimality and its high per-update cost (O(d²)) in this setting, we establish its Best-of-Both-Worlds optimality for the first time. We derive regret bounds under Fréchet and Pareto perturbations: O(√(m²d^{1/α}T) + √(mdT)) and a tight O(√(mdT)), respectively. To mitigate computational overhead, we propose Conditional Geometric Resampling (CGR), reducing per-update complexity to O(md(log(d/m)+1)) while preserving the optimal regret guarantees. Our analysis bridges theoretical optimality and practical efficiency, offering the first FTPL variant with both tight regret and near-linear update cost in m and d.
📝 Abstract
This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in size-invariant combinatorial semi-bandit problems. Recently, Honda et al. (2023) and Lee et al. (2024) showed that FTPL achieves Best-of-Both-Worlds (BOBW) optimality in standard multi-armed bandit problems with Fr'{e}chet-type distributions. However, the optimality of FTPL in combinatorial semi-bandit problems remains unclear. In this paper, we consider the regret bound of FTPL with geometric resampling (GR) in size-invariant semi-bandit setting, showing that FTPL respectively achieves $Oleft(sqrt{m^2 d^frac{1}{alpha}T}+sqrt{mdT}
ight)$ regret with Fr'{e}chet distributions, and the best possible regret bound of $Oleft(sqrt{mdT}
ight)$ with Pareto distributions in adversarial setting. Furthermore, we extend the conditional geometric resampling (CGR) to size-invariant semi-bandit setting, which reduces the computational complexity from $O(d^2)$ of original GR to $Oleft(mdleft(log(d/m)+1
ight)
ight)$ without sacrificing the regret performance of FTPL.