Reinforcement Learning for Target Zone Blood Glucose Control

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address safety and robustness challenges in Type 1 Diabetes Mellitus (T1DM) glycemic control—arising from intervention delays and inter-patient physiological heterogeneity—this work proposes a reinforcement learning framework integrating impulsive and switching control. We formulate a constrained Markov decision process (CMDP) incorporating physiologically informed state features, explicitly modeling pharmacokinetic–pharmacodynamic dynamics such as insulin action decay, and enforcing clinically grounded safety constraints. Theoretically, we establish convergence guarantees for the learned policy. Empirically, evaluated on a high-fidelity T1DM simulator, our method reduces hyperglycemia incidence from 22.4% to 10.8%, outperforming existing baselines. This work introduces a novel paradigm for personalized, safe, and adaptive decision-making in automated insulin delivery systems.

Technology Category

Application Category

📝 Abstract
Managing physiological variables within clinically safe target zones is a central challenge in healthcare, particularly for chronic conditions such as Type 1 Diabetes Mellitus (T1DM). Reinforcement learning (RL) offers promise for personalising treatment, but struggles with the delayed and heterogeneous effects of interventions. We propose a novel RL framework to study and support decision-making in T1DM technologies, such as automated insulin delivery. Our approach captures the complex temporal dynamics of treatment by unifying two control modalities: extit{impulse control} for discrete, fast-acting interventions (e.g., insulin boluses), and extit{switching control} for longer-acting treatments and regime shifts. The core of our method is a constrained Markov decision process augmented with physiological state features, enabling safe policy learning under clinical and resource constraints. The framework incorporates biologically realistic factors, including insulin decay, leading to policies that better reflect real-world therapeutic behaviour. While not intended for clinical deployment, this work establishes a foundation for future safe and temporally-aware RL in healthcare. We provide theoretical guarantees of convergence and demonstrate empirical improvements in a stylised T1DM control task, reducing blood glucose level violations from 22.4% (state-of-the-art) to as low as 10.8%.
Problem

Research questions and friction points this paper is trying to address.

Developing RL for personalized blood glucose control in T1DM
Addressing delayed and heterogeneous effects of interventions
Unifying impulse and switching control for temporal dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified impulse and switching control for T1DM
Constrained MDP with physiological state features
Biologically realistic insulin decay modeling
🔎 Similar Papers
No similar papers found.