Online Learning with Limited Information in the Sliding Window Model

📅 2026-01-07

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses online learning under the sliding window model, focusing on both the expert advice and multi-armed bandit settings where recent data carry greater importance. The proposed algorithm achieves a regret bound of √(nW)·polylog(nT) for any window length W using only two expert queries and polylog(nT) memory. In the streaming bandit setting, it attains a regret of nT^{2/3}·polylog(T), which improves to the optimal O(√(nT)) under the stochastic order assumption on the loss sequence. To the best of our knowledge, this is the first algorithm that simultaneously achieves near-optimal regret in both sliding-window and interval regret frameworks with constant query complexity and logarithmic memory. Moreover, it represents the first approach to combine sublinear regret with polylogarithmic memory in the streaming bandit problem.

Technology Category

Application Category

📝 Abstract

Motivated by recent work on the experts problem in the streaming model, we consider the experts problem in the sliding window model. The sliding window model is a well-studied model that captures applications such as traffic monitoring, epidemic tracking, and automated trading, where recent information is more valuable than older data. Formally, we have $n$ experts, $T$ days, the ability to query the predictions of $q$ experts on each day, a limited amount of memory, and should achieve the (near-)optimal regret $\sqrt{nW}\text{polylog}(nT)$ regret over any window of the last $W$ days. While it is impossible to achieve such regret with $1$ query, we show that with $2$ queries we can achieve such regret and with only $\text{polylog}(nT)$ bits of memory. Not only are our algorithms optimal for sliding windows, but we also show for every interval $\mathcal{I}$ of days that we achieve $\sqrt{n|\mathcal{I}|}\text{polylog}(nT)$ regret with $2$ queries and only $\text{polylog}(nT)$ bits of memory, providing an exponential improvement on the memory of previous interval regret algorithms. Building upon these techniques, we address the bandit problem in data streams, where $q=1$, achieving $n T^{2/3}\text{polylog}(T)$ regret with $\text{polylog}(nT)$ memory, which is the first sublinear regret in the streaming model in the bandit setting with polylogarithmic memory; this can be further improved to the optimal $\mathcal{O}(\sqrt{nT})$ regret if the best expert's losses are in a random order.

Problem

Research questions and friction points this paper is trying to address.

online learning

sliding window

limited information

regret minimization

streaming model

Innovation

Methods, ideas, or system contributions that make the work stand out.

sliding window model

online learning

limited memory