Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms

📅 2025-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In combinatorial multi-armed bandits with probabilistically triggered arms (CMAB-T), existing online methods suffer from high interaction costs and slow adaptation, while offline methods are hindered by data quality limitations and insufficient exploration capability. Method: We propose the first hybrid CMAB-T theoretical framework and design Hybrid CUCB—a novel algorithm that synergistically integrates offline-data-guided efficient exploration with online-interaction-driven bias correction. Contribution/Results: We establish a provably improved regret bound. Theoretically, under high-quality offline data, the cumulative regret is significantly reduced. Empirically, Hybrid CUCB achieves 37%–62% faster convergence compared to purely online or purely offline baselines, while demonstrating superior robustness and generalization across diverse problem instances.

Technology Category

Application Category

📝 Abstract
The problem of combinatorial multi-armed bandits with probabilistically triggered arms (CMAB-T) has been extensively studied. Prior work primarily focuses on either the online setting where an agent learns about the unknown environment through iterative interactions, or the offline setting where a policy is learned solely from logged data. However, each of these paradigms has inherent limitations: online algorithms suffer from high interaction costs and slow adaptation, while offline methods are constrained by dataset quality and lack of exploration capabilities. To address these complementary weaknesses, we propose hybrid CMAB-T, a new framework that integrates offline data with online interaction in a principled manner. Our proposed hybrid CUCB algorithm leverages offline data to guide exploration and accelerate convergence, while strategically incorporating online interactions to mitigate the insufficient coverage or distributional bias of the offline dataset. We provide theoretical guarantees on the algorithm's regret, demonstrating that hybrid CUCB significantly outperforms purely online approaches when high-quality offline data is available, and effectively corrects the bias inherent in offline-only methods when the data is limited or misaligned. Empirical results further demonstrate the consistent advantage of our algorithm.
Problem

Research questions and friction points this paper is trying to address.

Hybrid CMAB-T integrates offline data with online interaction to overcome individual limitations.
It leverages offline data to guide exploration and accelerate convergence in bandit problems.
The approach corrects bias in offline-only methods and reduces interaction costs in online algorithms.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates offline data with online interaction
Uses offline data to guide exploration and accelerate convergence
Incorporates online interactions to correct offline dataset bias
🔎 Similar Papers
No similar papers found.
K
Kongchang Zhou
Southern University of Science and Technology
T
Tingyu Zhang
Southern University of Science and Technology
W
Wei Chen
Microsoft Research
Fang Kong
Fang Kong
Southern University of Science and Technology, Assistant Professor
multi-armed banditsonline learningreinforcement learning