🤖 AI Summary
Causal inference from the fusion of experimental and observational data is often biased due to untestable assumptions—namely, external validity and ignorability. This paper introduces the first double machine learning (DML) framework that enables testable detection of violations of these assumptions. Our method jointly models both data sources via a residual-debiasing semiparametrically efficient estimator, yielding consistent estimation of treatment effects. We rigorously prove a “no-free-lunch” theorem, establishing that correct assumption identification is fundamentally necessary for consistency. Evaluated on multiple simulations and three real-world case studies, our approach significantly outperforms existing fusion methods in both estimation accuracy and robustness. Crucially, it provides diagnostic capability to detect assumption violations while maintaining theoretical guarantees. The framework thus offers both principled theoretical foundations and practical tools for causal extrapolation.
📝 Abstract
Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one of these assumptions is violated, we provide semiparametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. Through comparative analyses, we show our framework's superiority over existing data fusion methods. The practical utility of our approach is further exemplified by three real-world case studies, underscoring its potential for widespread application in empirical research.