🤖 AI Summary
This work addresses online learning under the sliding window model, focusing on both the expert advice and multi-armed bandit settings where recent data carry greater importance. The proposed algorithm achieves a regret bound of √(nW)·polylog(nT) for any window length W using only two expert queries and polylog(nT) memory. In the streaming bandit setting, it attains a regret of nT^{2/3}·polylog(T), which improves to the optimal O(√(nT)) under the stochastic order assumption on the loss sequence. To the best of our knowledge, this is the first algorithm that simultaneously achieves near-optimal regret in both sliding-window and interval regret frameworks with constant query complexity and logarithmic memory. Moreover, it represents the first approach to combine sublinear regret with polylogarithmic memory in the streaming bandit problem.
📝 Abstract
Motivated by recent work on the experts problem in the streaming model, we consider the experts problem in the sliding window model. The sliding window model is a well-studied model that captures applications such as traffic monitoring, epidemic tracking, and automated trading, where recent information is more valuable than older data. Formally, we have $n$ experts, $T$ days, the ability to query the predictions of $q$ experts on each day, a limited amount of memory, and should achieve the (near-)optimal regret $\sqrt{nW}\text{polylog}(nT)$ regret over any window of the last $W$ days. While it is impossible to achieve such regret with $1$ query, we show that with $2$ queries we can achieve such regret and with only $\text{polylog}(nT)$ bits of memory. Not only are our algorithms optimal for sliding windows, but we also show for every interval $\mathcal{I}$ of days that we achieve $\sqrt{n|\mathcal{I}|}\text{polylog}(nT)$ regret with $2$ queries and only $\text{polylog}(nT)$ bits of memory, providing an exponential improvement on the memory of previous interval regret algorithms. Building upon these techniques, we address the bandit problem in data streams, where $q=1$, achieving $n T^{2/3}\text{polylog}(T)$ regret with $\text{polylog}(nT)$ memory, which is the first sublinear regret in the streaming model in the bandit setting with polylogarithmic memory; this can be further improved to the optimal $\mathcal{O}(\sqrt{nT})$ regret if the best expert's losses are in a random order.