🤖 AI Summary
Estimating the average causal effect (ACE) in directed acyclic graphs (DAGs) with latent variables remains challenging—particularly for continuous variables—due to computational intractability of the g-formula, boundary violations in estimators, and lack of asymptotic theoretical guarantees in existing machine learning approaches.
Method: We propose a first-order corrected plug-in estimator and a targeted minimum loss estimator (TMLE), extending the backdoor and frontdoor criteria to primitively fixable graph structures. Our framework integrates nonparametric/semiparametric modeling, stable density ratio estimation, and boundary-preserving optimization.
Contribution/Results: We establish the first machine learning–compatible framework achieving double robustness, √n-consistency, semiparametric efficiency, and explicit parameter-space constraints. We characterize the L₂(P)-convergence rate requirements for nuisance function estimators and substantially improve estimation accuracy and statistical inference reliability. An open-source R package, flexCausal, implements automated identification and estimation.
📝 Abstract
The identification theory for causal effects in directed acyclic graphs (DAGs) with hidden variables is well-developed, but methods for estimating and inferring functionals beyond the g-formula remain limited. Previous studies have proposed semiparametric estimators for identifiable functionals in a broad class of DAGs with hidden variables. While demonstrating double robustness in some models, existing estimators face challenges, particularly with density estimation and numerical integration for continuous variables, and their estimates may fall outside the parameter space of the target estimand. Their asymptotic properties are also underexplored, especially when using flexible statistical and machine learning models for nuisance estimation. This study addresses these challenges by introducing novel one-step corrected plug-in and targeted minimum loss-based estimators of causal effects for a class of DAGs that extend classical back-door and front-door criteria (known as the treatment primal fixability criterion in prior literature). These estimators leverage machine learning to minimize modeling assumptions while ensuring key statistical properties such as asymptotic linearity, double robustness, efficiency, and staying within the bounds of the target parameter space. We establish conditions for nuisance functional estimates in terms of L2(P)-norms to achieve root-n consistent causal effect estimates. To facilitate practical application, we have developed the flexCausal package in R.