🤖 AI Summary
This paper studies the multi-fidelity best-arm identification problem: identifying the arm with the highest mean reward with high confidence, while allowing arm pulls at varying fidelities—each entailing a precision–cost trade-off—and minimizing total sampling cost. Addressing the lack of tight theoretical guarantees in existing methods, we establish the first instance-dependent asymptotic lower bound on the minimal achievable cost and prove that each arm admits a unique optimal fidelity. Building upon this insight, we propose a gradient-based adaptive sampling algorithm that integrates information-theoretic analysis with mechanistic fidelity modeling to achieve asymptotically optimal cost complexity. Experiments on synthetic and real-world datasets demonstrate that our method significantly outperforms baseline approaches, empirically validating both the existence of arm-specific optimal fidelities and the efficacy of our algorithm.
📝 Abstract
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm. Our first contribution is a tight, instance-dependent lower bound on the cost complexity. The study of the optimization problem featured in the lower bound provides new insights to devise computationally efficient algorithms, and leads us to propose a gradient-based approach with asymptotically optimal cost complexity. We demonstrate the benefits of the new algorithm compared to existing methods in experiments. Our theoretical and empirical findings also shed light on an intriguing concept of optimal fidelity for each arm.