🤖 AI Summary
This study addresses the complementary integration of experimental data (high internal validity but scarce and costly) and observational data (abundant and inexpensive but subject to unobserved confounding bias) in causal inference. Methodologically, we propose a unified empirical risk minimization framework featuring a weighted joint loss function—novelly incorporating both external validity (i.e., generalizability of experimental findings) and model goodness-of-fit into causal parameter estimation. Experimental and observational losses are adaptively balanced via cross-validation, and non-asymptotic error analysis ensures theoretical reliability. Experiments on synthetic and real-world datasets demonstrate that our approach significantly outperforms single-source baselines across estimation accuracy, robustness to confounding, and out-of-sample generalization.
📝 Abstract
We develop new methods to integrate experimental and observational data in causal inference. While randomized controlled trials offer strong internal validity, they are often costly and therefore limited in sample size. Observational data, though cheaper and often with larger sample sizes, are prone to biases due to unmeasured confounders. To harness their complementary strengths, we propose a systematic framework that formulates causal estimation as an empirical risk minimization (ERM) problem. A full model containing the causal parameter is obtained by minimizing a weighted combination of experimental and observational losses--capturing the causal parameter's validity and the full model's fit, respectively. The weight is chosen through cross-validation on the causal parameter across experimental folds. Our experiments on real and synthetic data show the efficacy and reliability of our method. We also provide theoretical non-asymptotic error bounds.