Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of allocating non-pharmaceutical interventions under limited resources during concurrent outbreaks of multiple heterogeneous epidemic clusters. It proposes the first scalable hierarchical reinforcement learning framework, formulating the problem as a constrained restless multi-armed bandit. In this framework, a global controller dynamically adjusts overall resource demand via continuous-action cost multipliers, while local policies estimate the marginal intervention value for individuals within each cluster. The approach accommodates asynchronously emerging clusters with varying scales and risk profiles, satisfying both resource constraints and real-time decision-making requirements. Evaluated on agent-based SARS-CoV-2 transmission simulations, the method significantly outperforms baseline strategies—achieving 20%–30% improved epidemic control across diverse system scales and testing budgets—and efficiently handles up to 40 concurrent clusters.

Technology Category

Application Category

📝 Abstract
Non-pharmaceutical interventions (NPIs), such as diagnostic testing and quarantine, are crucial for controlling infectious disease outbreaks but are often constrained by limited resources, particularly in early outbreak stages. In real-world public health settings, resources must be allocated across multiple outbreak clusters that emerge asynchronously, vary in size and risk, and compete for a shared resource budget. Here, a cluster corresponds to a group of close contacts generated by a single infected index case. Thus, decisions must be made under uncertainty and heterogeneous demands, while respecting operational constraints. We formulate this problem as a constrained restless multi-armed bandit and propose a hierarchical reinforcement learning framework. A global controller learns a continuous action cost multiplier that adjusts global resource demand, while a generalized local policy estimates the marginal value of allocating resources to individuals within each cluster. We evaluate the proposed framework in a realistic agent-based simulator of SARS-CoV-2 with dynamically arriving clusters. Across a wide range of system scales and testing budgets, our method consistently outperforms RMAB-inspired and heuristic baselines, improving outbreak control effectiveness by 20%-30%. Experiments on up to 40 concurrently active clusters further demonstrate that the hierarchical framework is highly scalable and enables faster decision-making than the RMAB-inspired method.
Problem

Research questions and friction points this paper is trying to address.

non-pharmaceutical interventions
resource allocation
multi-cluster outbreak
constrained optimization
infectious disease control
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical reinforcement learning
resource-constrained NPIs
restless multi-armed bandit
multi-cluster outbreak control
marginal resource allocation
🔎 Similar Papers
No similar papers found.