🤖 AI Summary
This work studies optimal arm identification in batched multi-armed bandits (MAB) and linear bandits, aiming to achieve high-accuracy identification with minimal batch count and total sample complexity. We propose an adaptive grid-based sampling allocation mechanism that breaks the conventional Ω(log(1/Δ₂)) batch complexity lower bound and achieves, for the first time, instance-sensitive near-optimal batch complexity. Our method integrates a confidence-interval-driven adaptive batching algorithm, an instance-dependent sample allocation strategy, and a unified framework extendable to linear bandits. We theoretically establish near-optimal sample complexity for both standard MAB and linear bandit settings. Empirical evaluations demonstrate consistent superiority over state-of-the-art baselines across diverse problem configurations, yielding substantial improvements in batch efficiency and generalization capability.
📝 Abstract
We investigate the problem of batched best arm identification in multi-armed bandits, where we aim to identify the best arm from a set of $n$ arms while minimizing both the number of samples and batches. We introduce an algorithm that achieves near-optimal sample complexity and features an instance-sensitive batch complexity, which breaks the $log(1/Delta_2)$ barrier. The main contribution of our algorithm is a novel sample allocation scheme that effectively balances exploration and exploitation for batch sizes. Experimental results indicate that our approach is more batch-efficient across various setups. We also extend this framework to the problem of batched best arm identification in linear bandits and achieve similar improvements.