Cross-Validated Causal Inference: a Modern Method to Combine Experimental and Observational Data

📅 2025-11-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study addresses the complementary integration of experimental data (high internal validity but scarce and costly) and observational data (abundant and inexpensive but subject to unobserved confounding bias) in causal inference. Methodologically, we propose a unified empirical risk minimization framework featuring a weighted joint loss function—novelly incorporating both external validity (i.e., generalizability of experimental findings) and model goodness-of-fit into causal parameter estimation. Experimental and observational losses are adaptively balanced via cross-validation, and non-asymptotic error analysis ensures theoretical reliability. Experiments on synthetic and real-world datasets demonstrate that our approach significantly outperforms single-source baselines across estimation accuracy, robustness to confounding, and out-of-sample generalization.

Technology Category

Application Category

📝 Abstract

We develop new methods to integrate experimental and observational data in causal inference. While randomized controlled trials offer strong internal validity, they are often costly and therefore limited in sample size. Observational data, though cheaper and often with larger sample sizes, are prone to biases due to unmeasured confounders. To harness their complementary strengths, we propose a systematic framework that formulates causal estimation as an empirical risk minimization (ERM) problem. A full model containing the causal parameter is obtained by minimizing a weighted combination of experimental and observational losses--capturing the causal parameter's validity and the full model's fit, respectively. The weight is chosen through cross-validation on the causal parameter across experimental folds. Our experiments on real and synthetic data show the efficacy and reliability of our method. We also provide theoretical non-asymptotic error bounds.

Problem

Research questions and friction points this paper is trying to address.

Integrating experimental and observational data for causal inference

Addressing biases in observational data through systematic framework

Optimizing causal parameter estimation via cross-validation technique

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates experimental and observational data

Formulates causal estimation as ERM problem

Uses cross-validation to determine loss weights

🔎 Similar Papers

A Double Machine Learning Approach to Combining Experimental and Observational Data