🤖 AI Summary
Existing clustering-based contextual bandit algorithms rely on strong assumptions—such as i.i.d. or highly diverse contexts—limiting their practical performance in real-world online gambling (bandit) scenarios. Method: We propose a weak-assumption clustering framework that eliminates these stringent context requirements. We design UniCLUB and PhaseUniCLUB, the first bandit algorithms incorporating smoothed adversarial context modeling to relax the i.i.d. assumption; they jointly integrate enhanced exploration, refined UCB estimation, smoothed regret analysis, and graph- or set-based clustering mechanisms. Contribution/Results: We theoretically establish that our algorithms achieve regret bounds matching those of state-of-the-art methods, yet under significantly weaker assumptions. Empirical evaluation on both synthetic and real-world datasets demonstrates substantial improvements in cluster identification accuracy and cumulative reward over existing approaches, validating the effectiveness and robustness of our framework.
📝 Abstract
The contextual multi-armed bandit (MAB) problem is crucial in sequential decision-making. A line of research, known as online clustering of bandits, extends contextual MAB by grouping similar users into clusters, utilizing shared features to improve learning efficiency. However, existing algorithms, which rely on the upper confidence bound (UCB) strategy, struggle to gather adequate statistical information to accurately identify unknown user clusters. As a result, their theoretical analyses require several strong assumptions about the"diversity"of contexts generated by the environment, leading to impractical settings, complicated analyses, and poor practical performance. Removing these assumptions has been a long-standing open problem in the clustering of bandits literature. In this paper, we provide two solutions to this open problem. First, following the i.i.d. context generation setting in existing studies, we propose two novel algorithms, UniCLUB and PhaseUniCLUB, which incorporate enhanced exploration mechanisms to accelerate cluster identification. Remarkably, our algorithms require substantially weaker assumptions while achieving regret bounds comparable to prior work. Second, inspired by the smoothed analysis framework, we propose a more practical setting that eliminates the requirement for i.i.d. context generation used in previous studies, thus enhancing the performance of existing algorithms for online clustering of bandits. Our technique can be applied to both graph-based and set-based clustering of bandits frameworks. Extensive evaluations on both synthetic and real-world datasets demonstrate that our proposed algorithms consistently outperform existing approaches.