🤖 AI Summary
In combinatorial multi-armed bandits (CMAB), UCB-type algorithms incur an undesirable $O(log T)$ regret overhead, while adversarial approaches (e.g., EXP3.M) suffer from excessive computational cost.
Method: This paper proposes an efficient randomized combinatorial semi-bandit decision framework and the CMOSS algorithm, which innovatively integrates combinatorial optimization, stochastic modeling, and a hybrid semi-/cascade feedback mechanism, grounded in a minimax-optimal strategy design.
Contribution/Results: CMOSS guarantees polynomial-time solvability while completely eliminating the $log T$ factor in regret. Its cumulative regret is theoretically bounded by $Oig((log k)^2 sqrt{kmT}ig)$, nearly matching the lower bound $Omega(sqrt{kmT})$. Extensive experiments on synthetic and real-world datasets demonstrate that CMOSS significantly outperforms baseline methods—including UCB variants and EXP3.M—in both regret performance and computational efficiency.
📝 Abstract
The combinatorial multi-armed bandit (CMAB) is a cornerstone of sequential decision-making framework, dominated by two algorithmic families: UCB-based and adversarial methods such as follow the regularized leader (FTRL) and online mirror descent (OMD). However, prominent UCB-based approaches like CUCB suffer from additional regret factor $log T$ that is detrimental over long horizons, while adversarial methods such as EXP3.M and HYBRID impose significant computational overhead. To resolve this trade-off, we introduce the Combinatorial Minimax Optimal Strategy in the Stochastic setting (CMOSS). CMOSS is a computationally efficient algorithm that achieves an instance-independent regret of $Oig( (log k)^2sqrt{kmT}ig )$ under semi-bandit feedback, where $m$ is the number of arms and $k$ is the maximum cardinality of a feasible action. Crucially, this result eliminates the dependency on $log T$ and matches the established $Ωig( sqrt{kmT}ig)$ lower bound up to $Oig((log k)^2ig)$. We then extend our analysis to show that CMOSS is also applicable to cascading feedback. Experiments on synthetic and real-world datasets validate that CMOSS consistently outperforms benchmark algorithms in both regret and runtime efficiency.