🤖 AI Summary
Reinforcement learning (RL) policies for dual-vasopressor titration in septic shock patients suffer from clinical impracticality and poor interpretability. Method: We propose an end-to-end RL framework featuring: (1) a discrete–continuous–directional coupled action space to enhance clinical operability; (2) integration of offline conservative Q-learning with recurrent neural networks to model temporal dependencies in the replay buffer, thereby improving policy stability and interpretability; and (3) joint training and validation on multi-center ICU time-series data from eICU and MIMIC. Results: Our method significantly improves 28-day survival probability by +15.2% on real-world data and generates treatment policies fully compliant with clinical guidelines. It establishes a novel, trustworthy paradigm for deploying clinical decision support systems (CDSS) in critical-care dual-vasopressor management.
📝 Abstract
Reinforcement learning (RL) applications in Clinical Decision Support Systems (CDSS) frequently encounter skepticism from practitioners regarding inoperable dosing decisions. We address this challenge with an end-to-end approach for learning optimal drug dosing and control policies for dual vasopressor administration in intensive care unit (ICU) patients with septic shock. For realistic drug dosing, we apply action space design that accommodates discrete, continuous, and directional dosing strategies in a system that combines offline conservative Q-learning with a novel recurrent modeling in a replay buffer to capture temporal dependencies in ICU time-series data. Our comparative analysis of norepinephrine dosing strategies across different action space formulations reveals that the designed action spaces improve interpretability and facilitate clinical adoption while preserving efficacy. Empirical results1 on eICU and MIMIC demonstrate that action space design profoundly influences learned behavioral policies. The proposed methods achieve improved patient outcomes of over 15% in survival improvement probability, while aligning with established clinical protocols.