Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms

📅 2025-12-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In combinatorial multi-armed bandits with probabilistically triggered arms (CMAB-T), existing online methods suffer from high interaction costs and slow adaptation, while offline methods are hindered by data quality limitations and insufficient exploration capability. Method: We propose the first hybrid CMAB-T theoretical framework and design Hybrid CUCB—a novel algorithm that synergistically integrates offline-data-guided efficient exploration with online-interaction-driven bias correction. Contribution/Results: We establish a provably improved regret bound. Theoretically, under high-quality offline data, the cumulative regret is significantly reduced. Empirically, Hybrid CUCB achieves 37%–62% faster convergence compared to purely online or purely offline baselines, while demonstrating superior robustness and generalization across diverse problem instances.

Technology Category

Application Category

📝 Abstract

The problem of combinatorial multi-armed bandits with probabilistically triggered arms (CMAB-T) has been extensively studied. Prior work primarily focuses on either the online setting where an agent learns about the unknown environment through iterative interactions, or the offline setting where a policy is learned solely from logged data. However, each of these paradigms has inherent limitations: online algorithms suffer from high interaction costs and slow adaptation, while offline methods are constrained by dataset quality and lack of exploration capabilities. To address these complementary weaknesses, we propose hybrid CMAB-T, a new framework that integrates offline data with online interaction in a principled manner. Our proposed hybrid CUCB algorithm leverages offline data to guide exploration and accelerate convergence, while strategically incorporating online interactions to mitigate the insufficient coverage or distributional bias of the offline dataset. We provide theoretical guarantees on the algorithm's regret, demonstrating that hybrid CUCB significantly outperforms purely online approaches when high-quality offline data is available, and effectively corrects the bias inherent in offline-only methods when the data is limited or misaligned. Empirical results further demonstrate the consistent advantage of our algorithm.

Problem

Research questions and friction points this paper is trying to address.

Hybrid CMAB-T integrates offline data with online interaction to overcome individual limitations.

It leverages offline data to guide exploration and accelerate convergence in bandit problems.

The approach corrects bias in offline-only methods and reduces interaction costs in online algorithms.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates offline data with online interaction

Uses offline data to guide exploration and accelerate convergence

Incorporates online interactions to correct offline dataset bias

🔎 Similar Papers

No similar papers found.

Authors to Follow