🤖 AI Summary
Estimating unbiased causal effects from multi-source observational data remains challenging due to confounding bias, post-treatment variables, and unobserved confounders.
Method: This paper proposes RAMEN, a novel algorithm that requires neither prior knowledge of the causal graph nor explicit causal structure learning. Leveraging heterogeneity across multiple environments, RAMEN integrates invariant learning with doubly robust estimation theory. It achieves unbiased causal effect identification under a mild assumption—namely, that at least one causal parent of either the treatment or the outcome is observed and exhibits cross-environment invariance.
Contribution/Results: Compared to state-of-the-art methods, RAMEN significantly improves estimation accuracy and reduces bias on both synthetic and real-world datasets. Empirical results demonstrate its strong robustness and practical efficacy even when the underlying causal structure is unknown.
📝 Abstract
Practical and ethical constraints often require the use of observational data for causal inference, particularly in medicine and social sciences. Yet, observational datasets are prone to confounding, potentially compromising the validity of causal conclusions. While it is possible to correct for biases if the underlying causal graph is known, this is rarely a feasible ask in practical scenarios. A common strategy is to adjust for all available covariates, yet this approach can yield biased treatment effect estimates, especially when post-treatment or unobserved variables are present. We propose RAMEN, an algorithm that produces unbiased treatment effect estimates by leveraging the heterogeneity of multiple data sources without the need to know or learn the underlying causal graph. Notably, RAMEN achieves doubly robust identification: it can identify the treatment effect whenever the causal parents of the treatment or those of the outcome are observed, and the node whose parents are observed satisfies an invariance assumption. Empirical evaluations on synthetic and real-world datasets show that our approach outperforms existing methods.