๐ค AI Summary
Causal forests suffer from systematic bias in average treatment effect (ATE) estimation because individual trees model treatment effects independently, leading to repeated classification errors across trees. To address this, we propose the Limit Inferior Leaf Interval (LILI) clustering algorithmโthe first to introduce the set-theoretic concept of limit inferior into causal inference. LILI quantifies interval-wise similarity of covariates across leaf nodes of different trees, thereby establishing inter-tree dependencies and identifying reliable counterfactual pairs. By integrating leaf-level interval clustering with counterfactual matching, LILI achieves provably consistent ATE estimation while significantly improving accuracy and robustness. We establish theoretical consistency of the estimator under standard regularity conditions. Extensive experiments on multiple benchmark datasets demonstrate that LILI consistently outperforms state-of-the-art causal forests and matching-based methods.
๐ Abstract
Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one tree are repeated in others, potentially leading to significant bias in causal e ect estimation. In this paper, we propose a novel approach that establishes connections between causal trees through the Limit Inferior Leaf Interval (LILI) clustering algorithm. LILIs are constructed based on the leaves of all causal trees, emphasizing the similarity of dataset confounders. When two instances with di erent treatments are grouped into the same leaf across a su cient number of causal trees, they are treated as counterfactual outcomes of each other. Through this clustering mechanism, LILI clustering reduces bias present in traditional causal tree methods and enhances the prediction accuracy for the average treatment e ect (ATE). By integrating LILIs into a causal forest, we develop an e cient causal inference method. Moreover, we explore several key properties of LILI by relating it to the concepts of limit inferior and limit superior in the set theory. Theoretical analysis rigorously proves the convergence of the estimated ATE using LILI clustering. Empirically, extensive comparative experiments demonstrate the superior performance of LILI clustering.