The Hardness of Validating Observational Studies with Experimental Data

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This paper addresses the problem of correcting systematic bias in causal effect estimates from large-scale observational studies using small-scale experimental data. Observational estimates are often biased due to unmeasured confounding, while experimental data—though capable of falsifying observational findings—cannot fully validate or eliminate such bias without additional assumptions. The paper establishes, for the first time, theoretical limits on “observational + experimental” fusion methods, proving that regularity conditions (e.g., smoothness) are necessary for bias correction. To overcome these limitations, the authors propose a novel semi-parametric correction framework based on Gaussian processes, enabling reliable interval estimation of causal effects—even across non-overlapping support domains. Theoretical analysis, simulations, and experiments on semi-synthetic data demonstrate the method’s effectiveness. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Observational data is often readily available in large quantities, but can lead to biased causal effect estimates due to the presence of unobserved confounding. Recent works attempt to remove this bias by supplementing observational data with experimental data, which, when available, is typically on a smaller scale due to the time and cost involved in running a randomised controlled trial. In this work, we prove a theorem that places fundamental limits on this ``best of both worlds'' approach. Using the framework of impossible inference, we show that although it is possible to use experimental data to emph{falsify} causal effect estimates from observational data, in general it is not possible to emph{validate} such estimates. Our theorem proves that while experimental data can be used to detect bias in observational studies, without additional assumptions on the smoothness of the correction function, it can not be used to remove it. We provide a practical example of such an assumption, developing a novel Gaussian Process based approach to construct intervals which contain the true treatment effect with high probability, both inside and outside of the support of the experimental data. We demonstrate our methodology on both simulated and semi-synthetic datasets and make the href{https://github.com/Jakefawkes/Obs_and_exp_data}{code available}.

Problem

Research questions and friction points this paper is trying to address.

Limits of validating observational data with experimental data.

Experimental data can detect but not remove bias without assumptions.

Proposes Gaussian Process method for accurate treatment effect estimation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses experimental data to detect observational bias

Develops Gaussian Process for treatment effect intervals

Validates methodology with simulated datasets

🔎 Similar Papers

A Double Machine Learning Approach to Combining Experimental and Observational Data