A Double Machine Learning Approach to Combining Experimental and Observational Data

📅 2023-07-04

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Causal inference from the fusion of experimental and observational data is often biased due to untestable assumptions—namely, external validity and ignorability. This paper introduces the first double machine learning (DML) framework that enables testable detection of violations of these assumptions. Our method jointly models both data sources via a residual-debiasing semiparametrically efficient estimator, yielding consistent estimation of treatment effects. We rigorously prove a “no-free-lunch” theorem, establishing that correct assumption identification is fundamentally necessary for consistency. Evaluated on multiple simulations and three real-world case studies, our approach significantly outperforms existing fusion methods in both estimation accuracy and robustness. Crucially, it provides diagnostic capability to detect assumption violations while maintaining theoretical guarantees. The framework thus offers both principled theoretical foundations and practical tools for causal extrapolation.

📝 Abstract

Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one of these assumptions is violated, we provide semiparametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. Through comparative analyses, we show our framework's superiority over existing data fusion methods. The practical utility of our approach is further exemplified by three real-world case studies, underscoring its potential for widespread application in empirical research.

Problem

Research questions and friction points this paper is trying to address.

Combining experimental and observational data to estimate treatment effects

Testing validity assumptions in causal inference studies

Providing consistent estimators when assumptions are violated

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining experimental and observational data via double machine learning

Providing consistent treatment effect estimators under assumption violations

Proposing falsification tests for external validity and ignorability

🔎 Similar Papers

No similar papers found.