Lookahead identification in adversarial bandits: accuracy and memory bounds

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

273K/year

🤖 AI Summary

This work addresses the challenge of identifying near-optimal arms in adversarial multi-armed bandits, where historical performance offers little predictive power for future rewards. The paper introduces and formalizes the novel task of “prospective identification,” wherein the learner must pre-specify a future time window and commit to an arm whose average reward over that window is close to optimal. Through information-theoretic lower bounds and algorithmic design, the authors demonstrate that non-trivial identification is achievable even under severe information constraints: an error of ε = O(1/√log T) is attainable over windows of length Ω(√T), while a fundamental lower bound of ε = Ω(1/log T) holds. Furthermore, they establish that Ω(K) bits of memory are necessary in general, though this requirement can be reduced to polylogarithmic levels under a local sparsity assumption.

Technology Category

Application Category

📝 Abstract

We study an identification problem in multi-armed bandits. In each round a learner selects one of $K$ arms and observes its reward, with the goal of eventually identifying an arm that will perform best at a {\it future} time. In adversarial environments, however, past performance may offer little information about the future, raising the question of whether meaningful identification is possible at all. In this work, we introduce \emph{lookahead identification}, a task in which the goal of the learner is to select a future prediction window and commit in advance to an arm whose average reward over that window is within $\varepsilon$ of optimal. Our analysis characterizes both the achievable accuracy of lookahead identification and the memory resources required to obtain it. From an accuracy standpoint, for any horizon $T$ we give an algorithm achieving $\varepsilon = O\bigl(1/\sqrt{\log T}\bigr)$ over $Ω(\sqrt{T})$ prediction windows. This demonstrates that, perhaps surprisingly, identification is possible in adversarial settings, despite significant lack of information. We also prove a near-matching lower bound showing that $\varepsilon = Ω\bigl(1/\log T\bigr)$ is unavoidable. We then turn to investigate the role of memory in our problem, first proving that any algorithm achieving nontrivial accuracy requires $Ω(K)$ bits of memory. Under a natural \emph{local sparsity} condition, we show that the same accuracy guarantees can be achieved using only poly-logarithmic memory.

Problem

Research questions and friction points this paper is trying to address.

adversarial bandits

lookahead identification

prediction window

memory bounds

accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

lookahead identification

adversarial bandits

memory complexity