🤖 AI Summary
This work addresses the challenge of identifying near-optimal arms in adversarial multi-armed bandits, where historical performance offers little predictive power for future rewards. The paper introduces and formalizes the novel task of “prospective identification,” wherein the learner must pre-specify a future time window and commit to an arm whose average reward over that window is close to optimal. Through information-theoretic lower bounds and algorithmic design, the authors demonstrate that non-trivial identification is achievable even under severe information constraints: an error of ε = O(1/√log T) is attainable over windows of length Ω(√T), while a fundamental lower bound of ε = Ω(1/log T) holds. Furthermore, they establish that Ω(K) bits of memory are necessary in general, though this requirement can be reduced to polylogarithmic levels under a local sparsity assumption.
📝 Abstract
We study an identification problem in multi-armed bandits. In each round a learner selects one of $K$ arms and observes its reward, with the goal of eventually identifying an arm that will perform best at a {\it future} time. In adversarial environments, however, past performance may offer little information about the future, raising the question of whether meaningful identification is possible at all.
In this work, we introduce \emph{lookahead identification}, a task in which the goal of the learner is to select a future prediction window and commit in advance to an arm whose average reward over that window is within $\varepsilon$ of optimal. Our analysis characterizes both the achievable accuracy of lookahead identification and the memory resources required to obtain it. From an accuracy standpoint, for any horizon $T$ we give an algorithm achieving $\varepsilon = O\bigl(1/\sqrt{\log T}\bigr)$ over $Ω(\sqrt{T})$ prediction windows. This demonstrates that, perhaps surprisingly, identification is possible in adversarial settings, despite significant lack of information. We also prove a near-matching lower bound showing that $\varepsilon = Ω\bigl(1/\log T\bigr)$ is unavoidable. We then turn to investigate the role of memory in our problem, first proving that any algorithm achieving nontrivial accuracy requires $Ω(K)$ bits of memory. Under a natural \emph{local sparsity} condition, we show that the same accuracy guarantees can be achieved using only poly-logarithmic memory.