🤖 AI Summary
This paper addresses the approximate optimization of finite-horizon Restless Multi-Armed Bandits (RMABs) under degeneracy—where conventional fluid-approximation-based linear programming (LP) policies suffer an optimality gap of only Θ(1/√N), markedly worse than the exponential convergence attained in non-degenerate settings. To overcome this limitation, we propose the first diffusion approximation framework that jointly incorporates both mean and variance information, replacing the standard mean-only fluid model. Integrating stochastic control theory and asymptotic analysis, we develop a novel relaxation and solution methodology tailored to degenerate RMABs. We rigorously prove that our approach achieves an optimality gap of $ ilde{O}(1/N)$—i.e., an absolute deviation of $O(1/N)$ from the true optimal value—thereby substantially improving upon the $O(1/sqrt{N})$ barrier of LP-based methods. Crucially, our bound is tight with respect to the exact optimal value, not a loose upper bound.
📝 Abstract
We study the finite horizon Restless Multi-Armed Bandit (RMAB) problem with $N$ homogeneous arms, focusing on the challenges posed by degenerate RMABs, which are prevalent in practical applications. While previous work has shown that Linear Programming (LP)-based policies achieve exponentially fast convergence relative to the LP upper bound in non-degenerate models, applying these LP-based policies to degenerate RMABs results in slower convergence rates of $O(1/sqrt{N})$. We construct a diffusion system that incorporates both the mean and variance of the stochastic processes, in contrast to the fluid system from the LP, which only accounts for the mean, thereby providing a more accurate representation of RMAB dynamics. Consequently, our novel diffusion-resolving policy achieves an optimality gap of $O(1/N)$ relative to the true optimal value, rather than the LP upper bound, revealing that the fluid approximation and the LP upper bound are too loose in degenerate settings. These insights pave the way for constructing policies that surpass the $O(1/sqrt{N})$ optimality gap for any RMAB, whether degenerate or not.