Many Experiments, Few Repetitions, Unpaired Data, and Sparse Effects: Is Causal Inference Possible?

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work proposes a novel generalized method of moments (GMM) framework that treats multiple experimental environments as high-dimensional instrumental variables to address causal effect estimation under challenging conditions: covariates and outcomes are not jointly observed, unobserved confounders are present, and each environment contains only a very small sample size. By integrating cross-fitting sample splitting, ℓ₁ regularization, and post-regularization refitting, the proposed estimator achieves consistent causal effect estimation even as the number of environments grows to infinity while the sample size per environment remains fixed. Moreover, it effectively identifies sparse causal structures, thereby overcoming the inconsistency inherent in conventional two-sample instrumental variable approaches under this setting.

Technology Category

Application Category

📝 Abstract

We study the problem of estimating causal effects under hidden confounding in the following unpaired data setting: we observe some covariates $X$ and an outcome $Y$ under different experimental conditions (environments) but do not observe them jointly; we either observe $X$ or $Y$. Under appropriate regularity conditions, the problem can be cast as an instrumental variable (IV) regression with the environment acting as a (possibly high-dimensional) instrument. When there are many environments but only a few observations per environment, standard two-sample IV estimators fail to be consistent. We propose a GMM-type estimator based on cross-fold sample splitting of the instrument-covariate sample and prove that it is consistent as the number of environments grows but the sample size per environment remains constant. We further extend the method to sparse causal effects via $\ell_1$-regularized estimation and post-selection refitting.

Problem

Research questions and friction points this paper is trying to address.

causal inference

hidden confounding

unpaired data

instrumental variable

sparse effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

instrumental variable

unpaired data

GMM estimator