Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback

πŸ“… 2025-01-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing reinforcement learning (RL) methods for insulin regulation in type 1 diabetes struggle to incorporate patient-specific preferences, resulting in non-personalized and clinically uninterpretable control policies. To address this, we propose PAINTβ€”a novel framework that synergistically integrates sketch-based reward modeling with safety-constrained offline RL, enabling continuous preference tuning, preprandial glucose anticipation, and device fault tolerance. PAINT incorporates expert-guided preference annotation, safety-aware action constraints, and few-shot robust training. In silico evaluation demonstrates a 15% reduction in glycemic risk, a 10% improvement in postprandial time-in-target range, and a 1.6% decrease in glucose variability under device anomalies. Moreover, the method exhibits strong robustness to annotation noise and low-data regimes. This work represents the first approach achieving simultaneous safety guarantees, adjustable personalization, and clinical interpretability in closed-loop glucose control.

Technology Category

Application Category

πŸ“ Abstract
Reinforcement learning (RL) has demonstrated success in automating insulin dosing in simulated type 1 diabetes (T1D) patients but is currently unable to incorporate patient expertise and preference. This work introduces PAINT (Preference Adaptation for INsulin control in T1D), an original RL framework for learning flexible insulin dosing policies from patient records. PAINT employs a sketch-based approach for reward learning, where past data is annotated with a continuous reward signal to reflect patient's desired outcomes. Labelled data trains a reward model, informing the actions of a novel safety-constrained offline RL algorithm, designed to restrict actions to a safe strategy and enable preference tuning via a sliding scale. In-silico evaluation shows PAINT achieves common glucose goals through simple labelling of desired states, reducing glycaemic risk by 15% over a commercial benchmark. Action labelling can also be used to incorporate patient expertise, demonstrating an ability to pre-empt meals (+10% time-in-range post-meal) and address certain device errors (-1.6% variance post-error) with patient guidance. These results hold under realistic conditions, including limited samples, labelling errors, and intra-patient variability. This work illustrates PAINT's potential in real-world T1D management and more broadly any tasks requiring rapid and precise preference learning under safety constraints.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Type 1 Diabetes Management
Personalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

PAINT
Personalized Insulin Delivery
Adaptive Algorithm
πŸ”Ž Similar Papers
No similar papers found.
H
Harry Emerson
University of Bristol, United Kingdom
S
Sam Gordon James
University of Bristol, United Kingdom
M
Matthew Guy
University of Bristol, United Kingdom; University Hospital Southampton, United Kingdom
Ryan McConville
Ryan McConville
University of Bristol
Machine LearningAI