🤖 AI Summary
This paper addresses individualized policy learning in causal inference, aiming to directly learn the optimal intervention policy that maximizes policy value from observational data—without explicitly modeling potential outcomes or separately estimating nuisance parameters. We propose the first end-to-end causal forest framework: it unifies conditional average treatment effect (CATE) estimation and policy optimization within a single {-1,1}-constrained CATE regression objective, jointly trained via mean squared error minimization—preserving the efficiency and scalability of tree-based models. On standard causal benchmarks, our method achieves significantly higher policy value than existing approaches, with computational overhead comparable to random forests and consistent superiority over multi-stage baselines. The core contributions are (i) a nuisance-parameter-free joint optimization paradigm and (ii) the first end-to-end adaptation of causal forests to policy learning.
📝 Abstract
This study proposes an end-to-end algorithm for policy learning in causal inference. We observe data consisting of covariates, treatment assignments, and outcomes, where only the outcome corresponding to the assigned treatment is observed. The goal of policy learning is to train a policy from the observed data, where a policy is a function that recommends an optimal treatment for each individual, to maximize the policy value. In this study, we first show that maximizing the policy value is equivalent to minimizing the mean squared error for the conditional average treatment effect (CATE) under ${-1, 1}$ restricted regression models. Based on this finding, we modify the causal forest, an end-to-end CATE estimation algorithm, for policy learning. We refer to our algorithm as the causal-policy forest. Our algorithm has three advantages. First, it is a simple modification of an existing, widely used CATE estimation method, therefore, it helps bridge the gap between policy learning and CATE estimation in practice. Second, while existing studies typically estimate nuisance parameters for policy learning as a separate task, our algorithm trains the policy in a more end-to-end manner. Third, as in standard decision trees and random forests, we train the models efficiently, avoiding computational intractability.