🤖 AI Summary
This work proposes a conservative continuous-time stochastic control framework to mitigate the risk of policy extrapolation caused by model misspecification when optimizing treatments over irregularly sampled patient trajectories. The approach models patient dynamics as a controlled stochastic differential equation and introduces a signature-based Maximum Mean Discrepancy (MMD) regularizer in path space to effectively constrain the divergence between the trajectory distribution induced by the treatment policy and the observed data distribution. By doing so, it optimizes a computable upper bound on the true cost. Experimental results demonstrate that the proposed method significantly outperforms non-conservative baselines on benchmark datasets, achieving improved robustness and therapeutic efficacy.
📝 Abstract
We develop a conservative continuous-time stochastic control framework for treatment optimization from irregularly sampled patient trajectories. The unknown patient dynamics are modeled as a controlled stochastic differential equation with treatment as a continuous-time control. Naive model-based optimization can exploit model errors and propose out-of-support controls, so optimizing the estimated dynamics may not optimize the true dynamics. To limit extrapolation, we add a consistent signature-based MMD regularizer on path space that penalizes treatment plans whose induced trajectory distribution deviates from observed trajectories. The resulting objective minimizes a computable upper bound on the true cost. Experiments on benchmark datasets show improved robustness and performance compared to non-conservative baselines.