๐ค AI Summary
This work addresses the pure exploration problem in multi-armed bandits, where the goal is to identifyโ with confidence at least $1-\delta$โwhether there exists an arm whose mean reward is no less than a given threshold $\mu_0$, returning "None" if no such arm exists, while minimizing the expected number of samples $\mathbb{E}[\tau]$. Focusing on scenarios with multiple qualifying arms, the paper establishes the first tight information-theoretic lower bound on sample complexity via optimization-theoretic arguments and introduces a novel algorithm that integrates adaptive sampling with confidence-level control. The proposed method achieves an expected sample complexity that matches the derived lower bound up to logarithmic polynomial factors across all problem instances, thereby resolving a long-standing open question regarding the tightness of sample complexity in this setting.
๐ Abstract
1-identification is a fundamental multi-armed bandit formulation on pure exploration. An agent aims to determine whether there exists a qualified arm whose mean reward is not less than a known threshold $\mu_0$, or to output \textsf{None} if it believes such an arm does not exist. The agent needs to guarantee its output is correct with probability at least $1-\delta$, while making expected total pulling times $\mathbb{E}\tau$ as small as possible. We work on 1-identification with two main contributions. (1) We utilize an optimization formulation to derive a new lower bound of $\mathbb{E}\tau$, when there is at least one qualified arm. (2) We design a new algorithm, deriving tight upper bounds whose gap to lower bounds are up to a polynomial of logarithm factor across all problem instance. Our result complements the analysis of $\mathbb{E}\tau$ when there are multiple qualified arms, which is an open problem left by history literature.