Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the optimal control problem for fluidized restless multi-armed bandits (FRMABs), focusing on canonical settings where state dynamics are affine or quadratic. We propose a learning-based framework integrating optimal control analysis, fluid approximation modeling, and large-scale simulation data generation. Crucially, we introduce— for the first time in FRMAB control—the Optimal Classification Tree with hyperplane splits (OCT-H) to construct interpretable, real-time state-feedback policies. Our approach departs from conventional numerical optimization paradigms, enabling high-accuracy decision-making in applications such as equipment maintenance, epidemic intervention, and fisheries management. Empirically, it achieves up to a 26-million-fold speedup over classical methods while preserving policy quality. This work bridges the gap between theoretical FRMAB modeling and deployable intelligent control.

Technology Category

Application Category

📝 Abstract
We propose a machine learning approach to the optimal control of fluid restless multi-armed bandits (FRMABs) with state equations that are either affine or quadratic in the state variables. By deriving fundamental properties of FRMAB problems, we design an efficient machine learning based algorithm. Using this algorithm, we solve multiple instances with varying initial states to generate a comprehensive training set. We then learn a state feedback policy using Optimal Classification Trees with hyperplane splits (OCT-H). We test our approach on machine maintenance, epidemic control and fisheries control problems. Our method yields high-quality state feedback policies and achieves a speed-up of up to 26 million times compared to a direct numerical algorithm for fluid problems.
Problem

Research questions and friction points this paper is trying to address.

Optimal control of fluid restless multi-armed bandits
Machine learning based algorithm design
State feedback policy learning and application
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning for FRMAB control
Optimal Classification Trees with hyperplane
Efficient algorithm for state feedback