🤖 AI Summary
Existing causal generative models (CGMs) are constrained by the no-hidden-confounding assumption or graph-structure specificity, limiting their ability to uniformly handle confounded causal queries.
Method: We propose CGM-HC—the first general-purpose CGM—requiring only observational data and a causal graph, and trained once via amortized learning to model underlying causal mechanisms. It integrates do-calculus, proxy-variable identification, and amortized inference within a Normalizing Flow framework.
Contribution/Results: We provide the first theoretical proof and practical implementation of identifiability for confounded counterfactual queries. CGM-HC uniformly supports all causal queries admitting either a valid adjustment set or informationally complete proxy variables—including interventional, counterfactual, and conditional distributions. Evaluated on benchmarks including Ecoli70, it significantly outperforms state-of-the-art methods. It scales to数十 variables, hundreds of queries, and multiple independent hidden confounders, demonstrating strong generalization and plug-and-play usability.
📝 Abstract
Causal generative models (CGMs) have recently emerged as capable approaches to simulate the causal mechanisms generating our observations, enabling causal inference. Unfortunately, existing approaches either are overly restrictive, assuming the absence of hidden confounders, or lack generality, being tailored to a particular query and graph. In this work, we introduce DeCaFlow, a CGM that accounts for hidden confounders in a single amortized training process using only observational data and the causal graph. Importantly, DeCaFlow can provably identify all causal queries with a valid adjustment set or sufficiently informative proxy variables. Remarkably, for the first time to our knowledge, we show that a confounded counterfactual query is identifiable, and thus solvable by DeCaFlow, as long as its interventional counterpart is as well. Our empirical results on diverse settings (including the Ecoli70 dataset, with 3 independent hidden confounders, tens of observed variables and hundreds of causal queries) show that DeCaFlow outperforms existing approaches, while demonstrating its out-of-the-box flexibility.