Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the optimal control problem for fluidized restless multi-armed bandits (FRMABs), focusing on canonical settings where state dynamics are affine or quadratic. We propose a learning-based framework integrating optimal control analysis, fluid approximation modeling, and large-scale simulation data generation. Crucially, we introduce— for the first time in FRMAB control—the Optimal Classification Tree with hyperplane splits (OCT-H) to construct interpretable, real-time state-feedback policies. Our approach departs from conventional numerical optimization paradigms, enabling high-accuracy decision-making in applications such as equipment maintenance, epidemic intervention, and fisheries management. Empirically, it achieves up to a 26-million-fold speedup over classical methods while preserving policy quality. This work bridges the gap between theoretical FRMAB modeling and deployable intelligent control.

Technology Category

Application Category

📝 Abstract

We propose a machine learning approach to the optimal control of fluid restless multi-armed bandits (FRMABs) with state equations that are either affine or quadratic in the state variables. By deriving fundamental properties of FRMAB problems, we design an efficient machine learning based algorithm. Using this algorithm, we solve multiple instances with varying initial states to generate a comprehensive training set. We then learn a state feedback policy using Optimal Classification Trees with hyperplane splits (OCT-H). We test our approach on machine maintenance, epidemic control and fisheries control problems. Our method yields high-quality state feedback policies and achieves a speed-up of up to 26 million times compared to a direct numerical algorithm for fluid problems.

Problem

Research questions and friction points this paper is trying to address.

Optimal control of fluid restless multi-armed bandits

Machine learning based algorithm design

State feedback policy learning and application

Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning for FRMAB control

Optimal Classification Trees with hyperplane

Efficient algorithm for state feedback

🔎 Similar Papers

Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits

2024-02-07Citations: 2

Authors to Follow