🤖 AI Summary
Track-and-Stop and its Sticky variant—widely used pure-exploration algorithms—lack non-asymptotic theoretical guarantees for small confidence levels δ.
Method: We develop explicit, computable finite-sample upper bounds on sample complexity by integrating optimal stopping theory, large-deviation analysis, and adaptive sampling strategies, augmented by environment-dependent optimal arm allocation characterization and tight confidence set construction.
Contribution/Results: Our bounds overcome the limitations of conventional asymptotic analysis (δ → 0) and rigorously establish that both algorithms achieve constant-factor optimality relative to the information-theoretic lower bound for *any* δ ∈ (0,1). Empirical evaluation confirms substantial improvements over existing asymptotically justified methods in finite-sample regimes, providing a theoretically grounded foundation for practical deployment.
📝 Abstract
In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environment. The probability of returning a wrong answer should not exceed a maximum risk parameter $delta$ and good algorithms make as few queries to the environment as possible. The Track-and-Stop algorithm is a pioneering method to solve these problems. Specifically, it is well-known that it enjoys asymptotic optimality sample complexity guarantees for $delta o 0$ whenever the map from the environment to its correct answers is single-valued (e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to settings where, for each environment, there might exist multiple correct answers (e.g., $epsilon$-optimal arm identification). Although both methods are optimal in the asymptotic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.