Cost-optimal Sequential Testing via Doubly Robust Q-learning

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This study addresses the challenge of informative sequential missingness in retrospective clinical data by proposing a doubly robust Q-learning framework to learn cost-optimal testing and stopping policies. The method integrates path-specific inverse probability weighting with contrast models, introducing weights that satisfy conditional normalization and constructing orthogonal pseudo-outcomes to ensure unbiased estimation of the optimal policy whenever either the propensity or contrast model is correctly specified. Theoretical analysis establishes oracle inequalities, convergence rates, regret bounds, and misclassification rates for the stage-wise contrast estimators. Empirical results demonstrate that the proposed approach significantly reduces testing costs while maintaining predictive accuracy, outperforming both weighted and complete-case baselines.

Technology Category

Application Category

📝 Abstract

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.

Problem

Research questions and friction points this paper is trying to address.

cost-optimal sequential testing

informative missingness

sequential decision-making

retrospective data

Innovation

Methods, ideas, or system contributions that make the work stand out.

doubly robust Q-learning

sequential testing

inverse probability weighting

informative missingness