Optimal Arm Elimination Algorithms for Combinatorial Bandits

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This paper addresses the failure of classical UCB algorithms in combinatorial multi-armed bandits under graph-structured feedback and combinatorial linear contextual settings, where insufficient exploration leads to suboptimal performance. We propose a novel framework based on a trichotomous arm elimination mechanism: arms are dynamically classified into “confirmed,” “active,” and “eliminated” sets, coupled with an explicit exploration strategy to ensure adequate sampling. Our approach unifies the modeling of graph-structured feedback and linear contextual information. It achieves the first near-optimal regret bound $ ilde{O}(sqrt{T})$ for both settings and provides a matching lower bound. The key innovation lies in integrating explicit exploration into the arm elimination paradigm—overcoming fundamental theoretical limitations of UCB-style algorithms under complex feedback structures—and substantially enhancing robustness and adaptability.

Technology Category

Application Category

📝 Abstract

Combinatorial bandits extend the classical bandit framework to settings where the learner selects multiple arms in each round, motivated by applications such as online recommendation and assortment optimization. While extensions of upper confidence bound (UCB) algorithms arise naturally in this context, adapting arm elimination methods has proved more challenging. We introduce a novel elimination scheme that partitions arms into three categories (confirmed, active, and eliminated), and incorporates explicit exploration to update these sets. We demonstrate the efficacy of our algorithm in two settings: the combinatorial multi-armed bandit with general graph feedback, and the combinatorial linear contextual bandit. In both cases, our approach achieves near-optimal regret, whereas UCB-based methods can provably fail due to insufficient explicit exploration. Matching lower bounds are also provided.

Problem

Research questions and friction points this paper is trying to address.

Develops elimination algorithms for combinatorial multi-armed bandits

Addresses insufficient exploration in UCB-based combinatorial bandit methods

Solves combinatorial bandits with graph feedback and contextual settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel elimination scheme with three arm categories

Explicit exploration for updating arm sets

Achieves near-optimal regret in combinatorial bandits

🔎 Similar Papers

Optimal Multi-Fidelity Best-Arm Identification