Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This paper addresses the intra-day multi-objective surgical suite scheduling problem under uncertainty, jointly optimizing elective/emergency case throughput, delays, sequence-dependent setup times, and overtime costs. We propose a multi-agent reinforcement learning framework based on cooperative Markov games: each operating room (OR) is modeled as an agent; a centralized-training-with-decentralized-execution (CTDE) architecture is adopted, with proximal policy optimization (PPO) for policy learning. A unified reward function integrates mixed-integer pre-scheduling reference times and type-aware quadratic delay penalties to jointly optimize efficiency and workload balancing. Evaluated on a simulated setting with six ORs and eight surgery types, our method significantly outperforms six heuristic rules across all seven metrics—throughput, average/maximum delay, overtime duration/cost, utilization imbalance, and makespan—and quantifies its gap to the hindsight optimal solution, demonstrating both effectiveness and near-optimality.

Technology Category

Application Category

📝 Abstract

Intraday surgical scheduling is a multi-objective decision problem under uncertainty-balancing elective throughput, urgent and emergency demand, delays, sequence-dependent setups, and overtime. We formulate the problem as a cooperative Markov game and propose a multi-agent reinforcement learning (MARL) framework in which each operating room (OR) is an agent trained with centralized training and decentralized execution. All agents share a policy trained via Proximal Policy Optimization (PPO), which maps rich system states to actions, while a within-epoch sequential assignment protocol constructs conflict-free joint schedules across ORs. A mixed-integer pre-schedule provides reference starting times for electives; we impose type-specific quadratic delay penalties relative to these references and a terminal overtime penalty, yielding a single reward that captures throughput, timeliness, and staff workload. In simulations reflecting a realistic hospital mix (six ORs, eight surgery types, random urgent and emergency arrivals), the learned policy outperforms six rule-based heuristics across seven metrics and three evaluation subsets, and, relative to an ex post MIP oracle, quantifies optimality gaps. Policy analytics reveal interpretable behavior-prioritizing emergencies, batching similar cases to reduce setups, and deferring lower-value electives. We also derive a suboptimality bound for the sequential decomposition under simplifying assumptions. We discuss limitations-including OR homogeneity and the omission of explicit staffing constraints-and outline extensions. Overall, the approach offers a practical, interpretable, and tunable data-driven complement to optimization for real-time OR scheduling.

Problem

Research questions and friction points this paper is trying to address.

Develops a multi-agent reinforcement learning framework for intraday surgical scheduling under uncertainty

Addresses balancing elective throughput, urgent demand, delays, setups, and overtime in operating rooms

Proposes a cooperative Markov game approach to optimize real-time OR scheduling decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent reinforcement learning for operating room scheduling

Centralized training with decentralized execution using PPO

Sequential assignment protocol for conflict-free joint schedules

🔎 Similar Papers

Adaptive Task Allocation in Multi-Human Multi-Robot Teams under Team Heterogeneity and Dynamic Information Uncertainty