🤖 AI Summary
This paper studies the contextual combinatorial semi-bandit problem, aiming to design a unified algorithm achieving optimal regret in both adversarial and $varepsilon$-corrupted stochastic environments. We propose the first dual-optimal algorithm based on the Shannon-entropy-regularized Follow-the-Regularized-Leader (FTRL) framework: it attains $widetilde{mathcal{O}}(sqrt{T})$ regret under adversarial contexts and $widetilde{mathcal{O}}(ln T)$ logarithmic regret in $varepsilon$-corrupted stochastic settings. Our key technical innovation lies in leveraging KKT conditions to reduce the high-dimensional Bregman projection—central to FTRL—to a one-dimensional root-finding problem, thereby bypassing the standard projection bottleneck and significantly improving per-round computational efficiency. To our knowledge, this is the first method in the combinatorial semi-bandit setting that simultaneously achieves theoretically optimal regret bounds in both environments while maintaining practical scalability.
📝 Abstract
We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees $widetilde{mathcal{O}}(sqrt{T})$ regret in the adversarial regime and $widetilde{mathcal{O}}(ln T)$ regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the $K$-dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that this combined strategy not only attains the attractive regret bounds of best-of-both-worlds algorithms but also delivers substantial per-round speed-ups, making it well-suited for large-scale, real-time applications.