🤖 AI Summary
To address safety and robustness challenges in Type 1 Diabetes Mellitus (T1DM) glycemic control—arising from intervention delays and inter-patient physiological heterogeneity—this work proposes a reinforcement learning framework integrating impulsive and switching control. We formulate a constrained Markov decision process (CMDP) incorporating physiologically informed state features, explicitly modeling pharmacokinetic–pharmacodynamic dynamics such as insulin action decay, and enforcing clinically grounded safety constraints. Theoretically, we establish convergence guarantees for the learned policy. Empirically, evaluated on a high-fidelity T1DM simulator, our method reduces hyperglycemia incidence from 22.4% to 10.8%, outperforming existing baselines. This work introduces a novel paradigm for personalized, safe, and adaptive decision-making in automated insulin delivery systems.
📝 Abstract
Managing physiological variables within clinically safe target zones is a central challenge in healthcare, particularly for chronic conditions such as Type 1 Diabetes Mellitus (T1DM). Reinforcement learning (RL) offers promise for personalising treatment, but struggles with the delayed and heterogeneous effects of interventions. We propose a novel RL framework to study and support decision-making in T1DM technologies, such as automated insulin delivery. Our approach captures the complex temporal dynamics of treatment by unifying two control modalities: extit{impulse control} for discrete, fast-acting interventions (e.g., insulin boluses), and extit{switching control} for longer-acting treatments and regime shifts. The core of our method is a constrained Markov decision process augmented with physiological state features, enabling safe policy learning under clinical and resource constraints. The framework incorporates biologically realistic factors, including insulin decay, leading to policies that better reflect real-world therapeutic behaviour. While not intended for clinical deployment, this work establishes a foundation for future safe and temporally-aware RL in healthcare. We provide theoretical guarantees of convergence and demonstrate empirical improvements in a stylised T1DM control task, reducing blood glucose level violations from 22.4% (state-of-the-art) to as low as 10.8%.