Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the fundamental limitations imposed by simultaneously constraining memory (to W bits) and the number of interaction batches (to B rounds) in stochastic multi-armed bandits. Through an information-theoretic bottleneck analysis, it establishes that any algorithm using only W bits of memory requires at least Ω(K/W) batches to achieve the near-minimax optimal regret bound of Õ(√KT). Furthermore, the paper presents the first algorithm that matches this lower bound, attaining Õ(√KT) regret with merely O(log T) memory and Õ(K) batches. By integrating information-theoretic arguments, local measure change techniques, and batch-adaptive scheduling, this study advances beyond prior results that considered memory or batching constraints in isolation, providing a unified understanding under joint resource limitations.

Technology Category

Application Category

📝 Abstract
We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in $B$ batches and has only $W$ bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret $\widetilde{O}(\sqrt{KT})$ is achievable with $O(\log T)$ bits of memory under fully adaptive interaction, and with a $K$-independent $O(\log\log T)$-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a $W$-bit memory constraint must use at least $Ω(K/W)$ batches to achieve near-minimax regret $\widetilde{O}(\sqrt{KT})$ , even under adaptive grids. In particular, logarithmic memory rules out $K$-independent batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire $Ω(K)$ bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with $B$ batches and $W$ bits of memory allows only $O(BW)$ bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using $O(\log T)$ bits of memory and $\widetilde{O}(K)$ batches that achieves regret $\widetilde{O}(\sqrt{KT})$, which nearly matches our lower bound.
Problem

Research questions and friction points this paper is trying to address.

stochastic bandits
space constraints
adaptivity constraints
batch learning
memory complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

stochastic bandits
memory constraints
batch learning
information bottleneck
minimax regret
🔎 Similar Papers
No similar papers found.
Ruiyuan Huang
Ruiyuan Huang
Fudan University
online learning
Z
Zicheng Lyu
School of Data Science, Fudan University
X
Xiaoyi Zhu
School of Data Science, Fudan University
Zengfeng Huang
Zengfeng Huang
Fudan University
AlgorithmsGraphsStreamingLearningTheory