A single algorithm for both restless and rested rotting bandits

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This work addresses the multi-armed bandit problem under time-decaying rewards, unifying the treatment of both “rested” (history-dependent) and “restless” (externally driven) rotting settings. The paper proposes the Rotting Adaptive Window UCB (RAW-UCB) algorithm, which achieves near-optimal regret bounds for both rotting bandit variants within a single framework—without requiring prior knowledge of the environment type or non-stationarity pattern. RAW-UCB integrates an adaptive sliding window with the upper confidence bound (UCB) principle and incorporates a robust model of reward decay dynamics. Theoretical analysis establishes its near-optimality, while experiments on both synthetic and real-world datasets demonstrate its effectiveness and strong generalization capability across diverse rotting scenarios.

Technology Category

Application Category

📝 Abstract
In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are recommended over and over) or by an external factor (e.g., content becomes outdated). These two situations can be modeled as specific instances of the rested and restless bandit settings, where arms are rotting (i.e., their value decrease over time). These problems were thought to be significantly different, since Levine et al. (2017) showed that state-of-the-art algorithms for restless bandit perform poorly in the rested rotting setting. In this paper, we introduce a novel algorithm, Rotting Adaptive Window UCB (RAW-UCB), that achieves near-optimal regret in both rotting rested and restless bandit, without any prior knowledge of the setting (rested or restless) and the type of non-stationarity (e.g., piece-wise constant, bounded variation). This is in striking contrast with previous negative results showing that no algorithm can achieve similar results as soon as rewards are allowed to increase. We confirm our theoretical findings on a number of synthetic and dataset-based experiments.
Problem

Research questions and friction points this paper is trying to address.

rotting bandits
rested bandits
restless bandits
non-stationary rewards
multi-armed bandits
Innovation

Methods, ideas, or system contributions that make the work stand out.

rotting bandits
rested bandits
restless bandits
non-stationary rewards
adaptive window
🔎 Similar Papers
2024-10-02International Conference on Machine LearningCitations: 1