🤖 AI Summary
This work addresses the challenge of constructing valid prediction intervals for counterfactual outcomes under runtime confounding, where only a subset of confounding variables is observed in the target population. Existing conformal prediction methods often fail to achieve nominal coverage in such settings. To overcome this limitation, the paper introduces semi-parametric efficiency theory into the conformal prediction framework, integrating debiased machine learning with counterfactual modeling to produce prediction intervals that maintain valid coverage despite missing confounders. The proposed method not only resolves the coverage failure induced by unobserved confounding but also attains faster convergence rates. Empirical evaluations on multiple synthetic and semi-synthetic datasets demonstrate that the approach consistently achieves the desired coverage levels and significantly outperforms standard conformal prediction methods.
📝 Abstract
Data-driven decision making frequently relies on predicting counterfactual outcomes. In practice, researchers commonly train counterfactual prediction models on a source dataset to inform decisions on a possibly separate target population. Conformal prediction has arisen as a popular method for producing assumption-lean prediction intervals for counterfactual outcomes that would arise under different treatment decisions in the target population of interest. However, existing methods require that every confounding factor of the treatment-outcome relationship used for training on the source data is additionally measured in the target population, risking miscoverage if important confounders are unmeasured in the target population. In this paper, we introduce a computationally efficient debiased machine learning framework that allows for valid prediction intervals when only a subset of confounders is measured in the target population, a common challenge referred to as runtime confounding. Grounded in semiparametric efficiency theory, we show the resulting prediction intervals achieve desired coverage rates with faster convergence compared to standard methods. Through numerous synthetic and semi-synthetic experiments, we demonstrate the utility of our proposed method.