Learning to Compress Time-to-Control: A Reinforcement Learning Framework for Chronic Disease Management

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This study addresses key challenges in medical reinforcement learning—namely sparse rewards, unreliable off-policy evaluation, and the deployment-simulation gap—by focusing on chronic disease management formulated as a constrained Markov decision process aimed at minimizing time-to-control (TTC). The work innovatively incorporates execution intensity (ε) and physician capability (κ) as structural components within a dual-loop architecture that integrates clinical preference learning with offline reinforcement learning. A hierarchical reward mechanism grounded in the CMS ACCESS model is introduced to better align with clinical objectives. Evaluated in simulated environments for hypertension and type 2 diabetes, the proposed capability-weighted approach improves TTC by 15 percentage points over uniform weighting and behavior policies, while ε-aware policies demonstrate strong cross-scenario generalization.
📝 Abstract
Reinforcement learning (RL) in healthcare has had mixed results, with reward sparsity, unreliable off-policy evaluation, and deployment-simulation gap as recurring failure modes. We argue that chronic disease management is structurally a more tractable RL setting than the acute-care problems the field has primarily studied, but only if the problem is formalized to exploit chronic care's properties. We propose such a formalization. The agent's objective is to compress time-to-control (TTC) under a tiered reward calibrated to the CMS ACCESS Model. Two quantities from our companion preference-learning paper [Singh et al. 2026] enter as load-bearing structural elements: the execution intensity εbounds action availability under a constrained Markov Decision Process, and the clinician capability κweights offline-data transitions during RL training. Together they couple preference learning and RL into a two-loop architecture. We present simulation results on synthetic state machines for hypertension and type 2 diabetes. Capability-weighted offline RL outperforms uniform-weighted offline RL and the behavior policy by 15 percentage points on T2D TTC; the uniform-weighted formulation (the standard in existing healthcare RL) underperforms even the heterogeneous behavior policy. \Epsilon-aware policies generalize across deployment regimes while ε-naive policies do not.
Problem

Research questions and friction points this paper is trying to address.

chronic disease management
time-to-control
reinforcement learning
reward sparsity
off-policy evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

time-to-control
offline reinforcement learning
preference learning
constrained MDP
chronic disease management
🔎 Similar Papers
No similar papers found.