🤖 AI Summary
Reliably evaluating the goodness-of-fit of conditional average treatment effect (CATE) estimates derived from observational data remains a critical challenge for applying causal inference in policy and personalized decision-making. This work proposes the CAFE framework, which introduces the first validation approach directly targeting CATE estimation—rather than the full outcome model—by leveraging auxiliary randomized controlled trial (RCT) data. CAFE stratifies the covariate space using propensity scores and conducts hypothesis tests based on group-level treatment effect comparisons. To enhance sensitivity to local model misspecification, it incorporates a maximal test statistic and employs a two-stage procedure to detect potential unmeasured confounding. The framework is compatible with both parametric models and flexible machine learning methods such as causal forests. Extensive experiments demonstrate that CAFE effectively identifies CATE model mismatches, offering a reliable assessment when both RCT and observational data are available.
📝 Abstract
Conditional average treatment effects (CATEs) are increasingly estimated from observational data and used to guide policy and individualized treatment decisions. Before such estimates can be trusted in practice, their predictive fitness needs to be assessed, yet observational data alone offer limited opportunities for doing so. We propose CATE Assessment via Fitness Evaluation (CAFE), a formal framework for directly assessing the goodness-of-fit of a CATE estimate learned from observational data, rather than the full underlying outcome model, using evidence from a randomized trial. CAFE partitions the trial covariate space according to estimated propensity scores (or the like) and compares observationally derived conditional treatment effects with group-level experimental averages. The framework accommodates a broad class of CATE learners, including parametric models and flexible machine learning methods such as causal forest and boosting. We establish theoretical guarantees under both the null and alternative hypotheses, and introduce a maximum-type extension to improve sensitivity to localized lack of fit. When both randomized trial and observational data are available, we further develop a two-stage procedure to detect the existence of unobserved confounders. Extensive numerical studies show the utility of the CAFE approach when assessing observational-derived CATE estimates.