Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This paper studies the non-stationary online restless multi-armed bandit (RMAB) problem, where arm dynamics and rewards evolve over time under a total variation budget $B$. Addressing the failure of classical RMAB algorithms in dynamic domains such as healthcare and recommendation, we establish the first theoretical framework for non-stationary RMAB. Our method integrates sliding-window estimation with upper confidence bound (UCB) principles and introduces a relaxed regret metric tailored to non-stationary environments. We derive a tight $widetilde{mathcal{O}}(N^2 B^{1/4} T^{3/4})$ dynamic regret bound—substantially improving upon static-baseline guarantees. Experiments confirm robustness and practical efficacy under state drift. Key contributions include: (i) the first comprehensive theoretical foundation for non-stationary RMAB; (ii) variation-budget-driven algorithm design; and (iii) provably sublinear dynamic regret.

Technology Category

Application Category

📝 Abstract

Online restless multi-armed bandits (RMABs) typically assume that each arm follows a stationary Markov Decision Process (MDP) with fixed state transitions and rewards. However, in real-world applications like healthcare and recommendation systems, these assumptions often break due to non-stationary dynamics, posing significant challenges for traditional RMAB algorithms. In this work, we specifically consider $N$-armd RMAB with non-stationary transition constrained by bounded variation budgets $B$. Our proposed mab; algorithm integrates sliding window reinforcement learning (RL) with an upper confidence bound (UCB) mechanism to simultaneously learn transition dynamics and their variations. We further establish that mab; achieves $widetilde{mathcal{O}}(N^2 B^{frac{1}{4}} T^{frac{3}{4}})$ regret bound by leveraging a relaxed definition of regret, providing a foundational theoretical framework for non-stationary RMAB problems for the first time.

Problem

Research questions and friction points this paper is trying to address.

Addresses non-stationary dynamics in restless multi-armed bandits

Proposes algorithm for learning non-stationary transition dynamics

Establishes regret bound for non-stationary RMAB problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sliding window RL for dynamic transitions

UCB mechanism for learning variations

Bounded variation budget constraint handling

🔎 Similar Papers

No similar papers found.