π€ AI Summary
This paper addresses best-arm identification (BAI) in multi-armed bandits under a finite sampling budget, motivated by practical applications such as A/B testing. We propose the Almost Tracking algorithmβa novel, anytime BAI method that achieves asymptotic optimality with respect to the Hβ risk measure. Unlike prior approaches, it requires no prespecified total budget, permits stopping at any time, and fully utilizes all collected samples without discarding any data. The algorithm employs a closed-form sampling rule coupled with an adaptive tracking mechanism to dynamically optimize cumulative sample allocation. Grounded in a min-max optimal framework, it ensures both theoretical rigor and computational efficiency. Experiments on synthetic and real-world datasets demonstrate that Almost Tracking significantly outperforms existing fixed-budget and anytime BAI methods, empirically validating its theoretical optimality and practical superiority.
π Abstract
We consider the best arm identification problem, where the goal is to identify the arm with the highest mean reward from a set of $K$ arms under a limited sampling budget. This problem models many practical scenarios such as A/B testing. We consider a class of algorithms for this problem, which is provably minimax optimal up to a constant factor. This idea is a generalization of existing works in fixed-budget best arm identification, which are limited to a particular choice of risk measures. Based on the framework, we propose Almost Tracking, a closed-form algorithm that has a provable guarantee on the popular risk measure $H_1$. Unlike existing algorithms, Almost Tracking does not require the total budget in advance nor does it need to discard a significant part of samples, which gives a practical advantage. Through experiments on synthetic and real-world datasets, we show that our algorithm outperforms existing anytime algorithms as well as fixed-budget algorithms.