A Double Machine Learning Approach to Combining Experimental and Observational Data

📅 2023-07-04
🏛️ arXiv.org
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
Causal inference from the fusion of experimental and observational data is often biased due to untestable assumptions—namely, external validity and ignorability. This paper introduces the first double machine learning (DML) framework that enables testable detection of violations of these assumptions. Our method jointly models both data sources via a residual-debiasing semiparametrically efficient estimator, yielding consistent estimation of treatment effects. We rigorously prove a “no-free-lunch” theorem, establishing that correct assumption identification is fundamentally necessary for consistency. Evaluated on multiple simulations and three real-world case studies, our approach significantly outperforms existing fusion methods in both estimation accuracy and robustness. Crucially, it provides diagnostic capability to detect assumption violations while maintaining theoretical guarantees. The framework thus offers both principled theoretical foundations and practical tools for causal extrapolation.
📝 Abstract
Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one of these assumptions is violated, we provide semiparametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. Through comparative analyses, we show our framework's superiority over existing data fusion methods. The practical utility of our approach is further exemplified by three real-world case studies, underscoring its potential for widespread application in empirical research.
Problem

Research questions and friction points this paper is trying to address.

Combining experimental and observational data to estimate treatment effects
Testing validity assumptions in causal inference studies
Providing consistent estimators when assumptions are violated
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining experimental and observational data via double machine learning
Providing consistent treatment effect estimators under assumption violations
Proposing falsification tests for external validity and ignorability
🔎 Similar Papers
No similar papers found.
M
Marco Morucci
New York University
V
Vittorio Orlandi
Duke University
Harsh Parikh
Harsh Parikh
Yale University
Causal InferenceCausalityEconometricsMachine LearningStatistics
Sudeepa Roy
Sudeepa Roy
Duke University, Computer Science
Databases
C
C. Rudin
Duke University
A
A. Volfovsky
Duke University