Near-Optimal Regret for Efficient Stochastic Combinatorial Semi-Bandits

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In combinatorial multi-armed bandits (CMAB), UCB-type algorithms incur an undesirable $O(log T)$ regret overhead, while adversarial approaches (e.g., EXP3.M) suffer from excessive computational cost. Method: This paper proposes an efficient randomized combinatorial semi-bandit decision framework and the CMOSS algorithm, which innovatively integrates combinatorial optimization, stochastic modeling, and a hybrid semi-/cascade feedback mechanism, grounded in a minimax-optimal strategy design. Contribution/Results: CMOSS guarantees polynomial-time solvability while completely eliminating the $log T$ factor in regret. Its cumulative regret is theoretically bounded by $Oig((log k)^2 sqrt{kmT}ig)$, nearly matching the lower bound $Omega(sqrt{kmT})$. Extensive experiments on synthetic and real-world datasets demonstrate that CMOSS significantly outperforms baseline methods—including UCB variants and EXP3.M—in both regret performance and computational efficiency.

Technology Category

Application Category

📝 Abstract
The combinatorial multi-armed bandit (CMAB) is a cornerstone of sequential decision-making framework, dominated by two algorithmic families: UCB-based and adversarial methods such as follow the regularized leader (FTRL) and online mirror descent (OMD). However, prominent UCB-based approaches like CUCB suffer from additional regret factor $log T$ that is detrimental over long horizons, while adversarial methods such as EXP3.M and HYBRID impose significant computational overhead. To resolve this trade-off, we introduce the Combinatorial Minimax Optimal Strategy in the Stochastic setting (CMOSS). CMOSS is a computationally efficient algorithm that achieves an instance-independent regret of $Oig( (log k)^2sqrt{kmT}ig )$ under semi-bandit feedback, where $m$ is the number of arms and $k$ is the maximum cardinality of a feasible action. Crucially, this result eliminates the dependency on $log T$ and matches the established $Ωig( sqrt{kmT}ig)$ lower bound up to $Oig((log k)^2ig)$. We then extend our analysis to show that CMOSS is also applicable to cascading feedback. Experiments on synthetic and real-world datasets validate that CMOSS consistently outperforms benchmark algorithms in both regret and runtime efficiency.
Problem

Research questions and friction points this paper is trying to address.

Resolves computational inefficiency in combinatorial bandits
Eliminates detrimental logarithmic regret factor dependency
Achieves near-optimal regret with semi-bandit feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

CMOSS algorithm eliminates logarithmic regret dependency
Achieves near-optimal regret with computational efficiency
Extends to cascading feedback with proven performance
🔎 Similar Papers
No similar papers found.