Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the contextual combinatorial semi-bandit problem, aiming to design a unified algorithm achieving optimal regret in both adversarial and $varepsilon$-corrupted stochastic environments. We propose the first dual-optimal algorithm based on the Shannon-entropy-regularized Follow-the-Regularized-Leader (FTRL) framework: it attains $widetilde{mathcal{O}}(sqrt{T})$ regret under adversarial contexts and $widetilde{mathcal{O}}(ln T)$ logarithmic regret in $varepsilon$-corrupted stochastic settings. Our key technical innovation lies in leveraging KKT conditions to reduce the high-dimensional Bregman projection—central to FTRL—to a one-dimensional root-finding problem, thereby bypassing the standard projection bottleneck and significantly improving per-round computational efficiency. To our knowledge, this is the first method in the combinatorial semi-bandit setting that simultaneously achieves theoretically optimal regret bounds in both environments while maintaining practical scalability.

Technology Category

Application Category

📝 Abstract
We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees $widetilde{mathcal{O}}(sqrt{T})$ regret in the adversarial regime and $widetilde{mathcal{O}}(ln T)$ regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the $K$-dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that this combined strategy not only attains the attractive regret bounds of best-of-both-worlds algorithms but also delivers substantial per-round speed-ups, making it well-suited for large-scale, real-time applications.
Problem

Research questions and friction points this paper is trying to address.

Develops best-of-both-worlds algorithm for contextual combinatorial semi-bandits
Solves high-dimensional projection bottleneck in FTRL framework
Achieves sublinear regret in both adversarial and stochastic regimes
Innovation

Methods, ideas, or system contributions that make the work stand out.

FTRL framework with Shannon entropy regularizer
KKT conditions for projection simplification
Single-variable root-finding for acceleration
🔎 Similar Papers
No similar papers found.