Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the intra-day multi-objective surgical suite scheduling problem under uncertainty, jointly optimizing elective/emergency case throughput, delays, sequence-dependent setup times, and overtime costs. We propose a multi-agent reinforcement learning framework based on cooperative Markov games: each operating room (OR) is modeled as an agent; a centralized-training-with-decentralized-execution (CTDE) architecture is adopted, with proximal policy optimization (PPO) for policy learning. A unified reward function integrates mixed-integer pre-scheduling reference times and type-aware quadratic delay penalties to jointly optimize efficiency and workload balancing. Evaluated on a simulated setting with six ORs and eight surgery types, our method significantly outperforms six heuristic rules across all seven metrics—throughput, average/maximum delay, overtime duration/cost, utilization imbalance, and makespan—and quantifies its gap to the hindsight optimal solution, demonstrating both effectiveness and near-optimality.

Technology Category

Application Category

📝 Abstract
Intraday surgical scheduling is a multi-objective decision problem under uncertainty-balancing elective throughput, urgent and emergency demand, delays, sequence-dependent setups, and overtime. We formulate the problem as a cooperative Markov game and propose a multi-agent reinforcement learning (MARL) framework in which each operating room (OR) is an agent trained with centralized training and decentralized execution. All agents share a policy trained via Proximal Policy Optimization (PPO), which maps rich system states to actions, while a within-epoch sequential assignment protocol constructs conflict-free joint schedules across ORs. A mixed-integer pre-schedule provides reference starting times for electives; we impose type-specific quadratic delay penalties relative to these references and a terminal overtime penalty, yielding a single reward that captures throughput, timeliness, and staff workload. In simulations reflecting a realistic hospital mix (six ORs, eight surgery types, random urgent and emergency arrivals), the learned policy outperforms six rule-based heuristics across seven metrics and three evaluation subsets, and, relative to an ex post MIP oracle, quantifies optimality gaps. Policy analytics reveal interpretable behavior-prioritizing emergencies, batching similar cases to reduce setups, and deferring lower-value electives. We also derive a suboptimality bound for the sequential decomposition under simplifying assumptions. We discuss limitations-including OR homogeneity and the omission of explicit staffing constraints-and outline extensions. Overall, the approach offers a practical, interpretable, and tunable data-driven complement to optimization for real-time OR scheduling.
Problem

Research questions and friction points this paper is trying to address.

Develops a multi-agent reinforcement learning framework for intraday surgical scheduling under uncertainty
Addresses balancing elective throughput, urgent demand, delays, setups, and overtime in operating rooms
Proposes a cooperative Markov game approach to optimize real-time OR scheduling decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent reinforcement learning for operating room scheduling
Centralized training with decentralized execution using PPO
Sequential assignment protocol for conflict-free joint schedules
🔎 Similar Papers
No similar papers found.
K
Kailiang Liu
Department of Mathematics, National University of Singapore
Y
Ying Chen
Department of Mathematics & Risk Management Institute & Center for Quantitative Finance, National University of Singapore
Ralf Borndörfer
Ralf Borndörfer
Department of Mathematics and Computer Science, Freie Universität Berlin Department of Network Optimization, Zuse Institute Berlin
Thorsten Koch
Thorsten Koch
TU Berlin / Zuse Institute Berlin
MathematicsLinear ProgrammingInteger Programming