Optimal Transport with Heterogeneously Missing Data

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the optimal transport (OT) estimation problem between two empirical distributions with heterogeneous missingness—where missing probabilities differ across features and between distributions—under the missing-completely-at-random (MCAR) assumption. To tackle this, we propose the first framework enabling unbiased estimation of the Wasserstein distance. Our method introduces an importance-sampling-based von Neumann trace estimator (ISVT) for automatic hyperparameter selection grounded in the Bures–Wasserstein metric, requiring no validation set and applicable to general matrix completion. We unify entropy-regularized OT, linear Monge maps, and Gaussian empirical distribution modeling, and establish a theoretical foundation for debiased estimation. Experiments demonstrate that our approach significantly reduces estimation bias and sample complexity across diverse heterogeneous missingness settings, while maintaining computational efficiency, statistical consistency, and robustness.

Technology Category

Application Category

📝 Abstract
We consider the problem of solving the optimal transport problem between two empirical distributions with missing values. Our main assumption is that the data is missing completely at random (MCAR), but we allow for heterogeneous missingness probabilities across features and across the two distributions. As a first contribution, we show that the Wasserstein distance between empirical Gaussian distributions and linear Monge maps between arbitrary distributions can be debiased without significantly affecting the sample complexity. Secondly, we show that entropic regularized optimal transport can be estimated efficiently and consistently using iterative singular value thresholding (ISVT). We propose a validation set-free hyperparameter selection strategy for ISVT that leverages our estimator of the Bures-Wasserstein distance, which could be of independent interest in general matrix completion problems. Finally, we validate our findings on a wide range of numerical applications.
Problem

Research questions and friction points this paper is trying to address.

Debiasing Wasserstein distance for empirical Gaussian distributions with missing data
Estimating entropic regularized optimal transport using iterative singular value thresholding
Proposing hyperparameter selection for ISVT without validation sets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Debiasing Wasserstein distance for Gaussian distributions
Efficient entropic OT via iterative SVT
Hyperparameter selection without validation sets
🔎 Similar Papers
No similar papers found.