Near Optimal Best Arm Identification for Clustered Bandits

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies best-arm identification (BAI) in multi-agent multi-armed bandits under unknown agent-arm mapping: each agent must identify its private optimal arm, yet the assignment structure between agents and arms is unknown. We propose two novel paradigms—Cl-BAI and BAI-Cl—that decouple distributed clustering discovery from BAI for the first time, achieving order-optimal minimax sample complexity when the number of agents (M) is constant. Our approach builds upon a successive elimination framework, integrating distributed clustering, statistical hypothesis testing, and cross-agent information sharing. We establish theoretical guarantees on (delta)-correctness and derive tight sample complexity bounds. Experiments on MovieLens and Yelp datasets demonstrate significant improvements over baselines; notably, in the (M ll N) regime (where (N) is the number of arms), our methods reduce communication and sampling costs by over 40%.

Technology Category

Application Category

📝 Abstract
This work investigates the problem of best arm identification for multi-agent multi-armed bandits. We consider $N$ agents grouped into $M$ clusters, where each cluster solves a stochastic bandit problem. The mapping between agents and bandits is a priori unknown. Each bandit is associated with $K$ arms, and the goal is to identify the best arm for each agent under a $delta$-probably correct ($delta$-PC) framework, while minimizing sample complexity and communication overhead. We propose two novel algorithms: Clustering then Best Arm Identification (Cl-BAI) and Best Arm Identification then Clustering (BAI-Cl). Cl-BAI uses a two-phase approach that first clusters agents based on the bandit problems they are learning, followed by identifying the best arm for each cluster. BAI-Cl reverses the sequence by identifying the best arms first and then clustering agents accordingly. Both algorithms leverage the successive elimination framework to ensure computational efficiency and high accuracy. We establish $delta$-PC guarantees for both methods, derive bounds on their sample complexity, and provide a lower bound for this problem class. Moreover, when $M$ is small (a constant), we show that the sample complexity of a variant of BAI-Cl is minimax optimal in an order-wise sense. Experiments on synthetic and real-world datasets (MovieLens, Yelp) demonstrate the superior performance of the proposed algorithms in terms of sample and communication efficiency, particularly in settings where $M ll N$.
Problem

Research questions and friction points this paper is trying to address.

Identify best arm for each agent in clustered bandits
Minimize sample complexity and communication overhead
Solve multi-agent multi-armed bandits with unknown mappings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase clustering and arm identification approach
Successive elimination framework for efficiency
Minimax optimal sample complexity guarantees
🔎 Similar Papers
No similar papers found.
Y
Yash
C-MInDS, IIT Bombay, India
Nikhil Karamchandani
Nikhil Karamchandani
IIT Bombay
A
Avishek Ghosh
Department of Computer Science and Engineering, IIT Bombay, India