Achieving O(1/N) Optimality Gap in Restless Bandits through Diffusion Approximation

📅 2024-10-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the approximate optimization of finite-horizon Restless Multi-Armed Bandits (RMABs) under degeneracy—where conventional fluid-approximation-based linear programming (LP) policies suffer an optimality gap of only Θ(1/√N), markedly worse than the exponential convergence attained in non-degenerate settings. To overcome this limitation, we propose the first diffusion approximation framework that jointly incorporates both mean and variance information, replacing the standard mean-only fluid model. Integrating stochastic control theory and asymptotic analysis, we develop a novel relaxation and solution methodology tailored to degenerate RMABs. We rigorously prove that our approach achieves an optimality gap of $ ilde{O}(1/N)$—i.e., an absolute deviation of $O(1/N)$ from the true optimal value—thereby substantially improving upon the $O(1/sqrt{N})$ barrier of LP-based methods. Crucially, our bound is tight with respect to the exact optimal value, not a loose upper bound.

Technology Category

Application Category

📝 Abstract
We study the finite horizon Restless Multi-Armed Bandit (RMAB) problem with $N$ homogeneous arms, focusing on the challenges posed by degenerate RMABs, which are prevalent in practical applications. While previous work has shown that Linear Programming (LP)-based policies achieve exponentially fast convergence relative to the LP upper bound in non-degenerate models, applying these LP-based policies to degenerate RMABs results in slower convergence rates of $O(1/sqrt{N})$. We construct a diffusion system that incorporates both the mean and variance of the stochastic processes, in contrast to the fluid system from the LP, which only accounts for the mean, thereby providing a more accurate representation of RMAB dynamics. Consequently, our novel diffusion-resolving policy achieves an optimality gap of $O(1/N)$ relative to the true optimal value, rather than the LP upper bound, revealing that the fluid approximation and the LP upper bound are too loose in degenerate settings. These insights pave the way for constructing policies that surpass the $O(1/sqrt{N})$ optimality gap for any RMAB, whether degenerate or not.
Problem

Research questions and friction points this paper is trying to address.

Achieving optimality gap in degenerate RMABs
Improving LP-based policies with Gaussian approximation
Reducing optimality gap to tilde{O}(1/N) for RMABs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian approximation for RMAB dynamics
Solves stochastic program for policy derivation
Achieves tilde{O}(1/N) optimality gap
🔎 Similar Papers
No similar papers found.
Chen Yan
Chen Yan
Associate Professor, Zhejiang University, College of EE
CPS SecurityEmbedded System SecuritySensor Security
W
Weina Wang
Computer Science Department, Carnegie Mellon University
L
Lei Ying
Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor