Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of allocating non-pharmaceutical interventions under limited resources during concurrent outbreaks of multiple heterogeneous epidemic clusters. It proposes the first scalable hierarchical reinforcement learning framework, formulating the problem as a constrained restless multi-armed bandit. In this framework, a global controller dynamically adjusts overall resource demand via continuous-action cost multipliers, while local policies estimate the marginal intervention value for individuals within each cluster. The approach accommodates asynchronously emerging clusters with varying scales and risk profiles, satisfying both resource constraints and real-time decision-making requirements. Evaluated on agent-based SARS-CoV-2 transmission simulations, the method significantly outperforms baseline strategies—achieving 20%–30% improved epidemic control across diverse system scales and testing budgets—and efficiently handles up to 40 concurrent clusters.

Technology Category

Application Category

📝 Abstract

Non-pharmaceutical interventions (NPIs), such as diagnostic testing and quarantine, are crucial for controlling infectious disease outbreaks but are often constrained by limited resources, particularly in early outbreak stages. In real-world public health settings, resources must be allocated across multiple outbreak clusters that emerge asynchronously, vary in size and risk, and compete for a shared resource budget. Here, a cluster corresponds to a group of close contacts generated by a single infected index case. Thus, decisions must be made under uncertainty and heterogeneous demands, while respecting operational constraints. We formulate this problem as a constrained restless multi-armed bandit and propose a hierarchical reinforcement learning framework. A global controller learns a continuous action cost multiplier that adjusts global resource demand, while a generalized local policy estimates the marginal value of allocating resources to individuals within each cluster. We evaluate the proposed framework in a realistic agent-based simulator of SARS-CoV-2 with dynamically arriving clusters. Across a wide range of system scales and testing budgets, our method consistently outperforms RMAB-inspired and heuristic baselines, improving outbreak control effectiveness by 20%-30%. Experiments on up to 40 concurrently active clusters further demonstrate that the hierarchical framework is highly scalable and enables faster decision-making than the RMAB-inspired method.

Problem

Research questions and friction points this paper is trying to address.

non-pharmaceutical interventions

resource allocation

multi-cluster outbreak

constrained optimization

infectious disease control

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical reinforcement learning

resource-constrained NPIs

restless multi-armed bandit