đ¤ AI Summary
This work investigates the fundamental limitations imposed by simultaneously constraining memory (to W bits) and the number of interaction batches (to B rounds) in stochastic multi-armed bandits. Through an information-theoretic bottleneck analysis, it establishes that any algorithm using only W bits of memory requires at least Ί(K/W) batches to achieve the near-minimax optimal regret bound of Ă(âKT). Furthermore, the paper presents the first algorithm that matches this lower bound, attaining Ă(âKT) regret with merely O(log T) memory and Ă(K) batches. By integrating information-theoretic arguments, local measure change techniques, and batch-adaptive scheduling, this study advances beyond prior results that considered memory or batching constraints in isolation, providing a unified understanding under joint resource limitations.
đ Abstract
We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in $B$ batches and has only $W$ bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret $\widetilde{O}(\sqrt{KT})$ is achievable with $O(\log T)$ bits of memory under fully adaptive interaction, and with a $K$-independent $O(\log\log T)$-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a $W$-bit memory constraint must use at least $Ί(K/W)$ batches to achieve near-minimax regret $\widetilde{O}(\sqrt{KT})$ , even under adaptive grids. In particular, logarithmic memory rules out $K$-independent batch complexity.
Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire $Ί(K)$ bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with $B$ batches and $W$ bits of memory allows only $O(BW)$ bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using $O(\log T)$ bits of memory and $\widetilde{O}(K)$ batches that achieves regret $\widetilde{O}(\sqrt{KT})$, which nearly matches our lower bound.