Extension of coupling via the Projection of Optimal Transport

πŸ“… 2026-03-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of effectively integrating a small amount of coupled data with abundant uncoupled marginal observations to enhance downstream statistical inference. The authors propose a fully nonparametric approach that aligns marginal data with limited coupled samples via optimal transport projections and introduces an explicit estimator grounded in the notion of β€œshadow” couplings to extrapolate the dependence structure and improve estimation accuracy. The method offers geometric interpretability, numerical stability, and near-linear-time parallelizability. Theoretical guarantees are established by synthesizing tools from optimal transport theory, projection-based estimation, and sample complexity analysis. Extensive experiments on both synthetic and real-world datasets demonstrate the method’s high accuracy and computational efficiency.
πŸ“ Abstract
In many statistical settings, two types of data are available: coupled data, which preserve the joint structure among variables but are limited in size due to cost or privacy constraints, and marginal data, which are available at larger scales but lack joint structure. Since standard methods require coupled data, marginal information is often discarded. We propose a fully nonparametric procedure that integrates decoupled marginal data with a limited amount of coupled data to improve the downstream analysis. The approach can be understood as an extension of coupling via projection in optimal transport. Specifically, the estimator is a solution for the optimal transport projection over the space of probability measures, which genuinely provides a natural geometric interpretation. Not only is its stability established, but its sample complexity is also derived using recent advances in statistical optimal transport. In addition to this, we present its explicit formula based on ``shadow," a notion introduced by Eckstein and Nutz. Furthermore, the estimator can be approximated in almost linear time and in parallel by entropic shadow, which demonstrates the theoretical and practical strengths of our methods. Lastly, we present experiments with real and synthetic data to justify the performance of our method.
Problem

Research questions and friction points this paper is trying to address.

optimal transport
coupled data
marginal data
data integration
nonparametric estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

optimal transport
projection coupling
marginal data integration
shadow estimator
nonparametric statistics
πŸ”Ž Similar Papers