Minimum Wasserstein distance estimator under covariate shift: closed-form, super-efficiency and irregularity

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses statistical inference under covariate shift in the presence of missing data and causal inference settings by proposing a Wasserstein distance minimization framework that avoids explicit modeling of regression functions or importance weights. Under the assumption of invariance of the conditional response distribution, the resulting W-estimator admits a closed-form solution and is shown to be numerically equivalent to the 1-nearest neighbor estimator, thereby offering a novel optimal transport interpretation of nearest neighbor methods. Theoretical analysis reveals its non-asymptotic linearity, demonstrates super-efficiency beyond the semiparametric efficiency bound under certain conditions, and establishes asymptotic normality with root-n convergence. Numerical experiments and empirical validation on rainfall data confirm the superior performance of the proposed approach.

Technology Category

Application Category

📝 Abstract

Covariate shift arises when covariate distributions differ between source and target populations while the conditional distribution of the response remains invariant, and it underlies problems in missing data and causal inference. We propose a minimum Wasserstein distance estimation framework for inference under covariate shift that avoids explicit modeling of outcome regressions or importance weights. The resulting W-estimator admits a closed-form expression and is numerically equivalent to the classical 1-nearest neighbor estimator, yielding a new optimal transport interpretation of nearest neighbor methods. We establish root-$n$ asymptotic normality and show that the estimator is not asymptotically linear, leading to super-efficiency relative to the semiparametric efficient estimator under covariate shift in certain regimes, and uniformly in missing data problems. Numerical simulations, along with an analysis of a rainfall dataset, underscore the exceptional performance of our W-estimator.

Problem

Research questions and friction points this paper is trying to address.

covariate shift

missing data

causal inference

statistical inference

Wasserstein distance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Wasserstein distance

covariate shift

super-efficiency