🤖 AI Summary
This work addresses the challenge posed by the exponentially large action space \(N\) in combinatorial bandits, where existing algorithms struggle to simultaneously achieve low regret and computational efficiency. The paper proposes the first efficient algorithm that attains a polylogarithmic swap-regret bound in \(N\) while maintaining a per-round computational complexity of only \(\mathrm{polylog}(N)\). By integrating techniques from combinatorial optimization and online learning theory, the method introduces a carefully designed sampling and feedback mechanism that enables a low-overhead framework for swap-regret minimization. This approach offers both tight theoretical guarantees and practical scalability across a range of canonical combinatorial settings.
📝 Abstract
This paper addresses the problem of designing efficient no-swap regret algorithms for combinatorial bandits, where the number of actions $N$ is exponentially large in the dimensionality of the problem. In this setting, designing efficient no-swap regret translates to sublinear -- in horizon $T$ -- swap regret with polylogarithmic dependence on $N$. In contrast to the weaker notion of external regret minimization - a problem which is fairly well understood in the literature - achieving no-swap regret with a polylogarithmic dependence on $N$ has remained elusive in combinatorial bandits. Our paper resolves this challenge, by introducing a no-swap-regret learning algorithm with regret that scales polylogarithmically in $N$ and is tight for the class of combinatorial bandits. To ground our results, we also demonstrate how to implement the proposed algorithm efficiently -- that is, with a per-iteration complexity that also scales polylogarithmically in $N$ -- across a wide range of well-studied applications.