🤖 AI Summary
This work addresses statistical inference under covariate shift in the presence of missing data and causal inference settings by proposing a Wasserstein distance minimization framework that avoids explicit modeling of regression functions or importance weights. Under the assumption of invariance of the conditional response distribution, the resulting W-estimator admits a closed-form solution and is shown to be numerically equivalent to the 1-nearest neighbor estimator, thereby offering a novel optimal transport interpretation of nearest neighbor methods. Theoretical analysis reveals its non-asymptotic linearity, demonstrates super-efficiency beyond the semiparametric efficiency bound under certain conditions, and establishes asymptotic normality with root-n convergence. Numerical experiments and empirical validation on rainfall data confirm the superior performance of the proposed approach.
📝 Abstract
Covariate shift arises when covariate distributions differ between source and target populations while the conditional distribution of the response remains invariant, and it underlies problems in missing data and causal inference. We propose a minimum Wasserstein distance estimation framework for inference under covariate shift that avoids explicit modeling of outcome regressions or importance weights. The resulting W-estimator admits a closed-form expression and is numerically equivalent to the classical 1-nearest neighbor estimator, yielding a new optimal transport interpretation of nearest neighbor methods. We establish root-$n$ asymptotic normality and show that the estimator is not asymptotically linear, leading to super-efficiency relative to the semiparametric efficient estimator under covariate shift in certain regimes, and uniformly in missing data problems. Numerical simulations, along with an analysis of a rainfall dataset, underscore the exceptional performance of our W-estimator.