Fast distance computation of multivariate distributions via nonparanormal transport

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of Wasserstein distance in high dimensions and its inefficiency in handling non-Gaussian, skewed distributions. The authors propose a novel Non-normalized Probability Transport (NPT) metric that, for the first time, integrates a family of non-normal distributions into the optimal transport framework. By leveraging nonparametric transformations and a Gaussian coupling structure, they derive a closed-form distance formula. The method preserves the ability to model complex multivariate distributions while dramatically improving computational efficiency: in both 2D and 5D settings, NPT achieves near-perfect agreement with Wasserstein distance but is over 1,000 times faster. The approach is successfully applied to analyze differences in oxygen saturation distributions among patients with sleep apnea.

Technology Category

Application Category

📝 Abstract
With the increasing availability of data objects in the form of probability distributions, there is a growing need for statistical methods tailored to distributional data. Distance measures, especially the pairwise distance matrix between data objects, provide the foundation for a wide range of modern data analysis methods, such as clustering, multidimensional scaling, and distance-based regression, among others. The Wasserstein distance is commonly used with distributional data due to its compelling optimal transport property. However, while the Wasserstein distance can be efficiently computed for univariate distributions, its application to multivariate distributions is limited due to high computational costs. To address these scalability issues, we introduce the Nonparanormal Transport (NPT) metric, a closed-form distance based on the flexible nonparanormal distribution family for modeling skewed and non-Gaussian multivariate data. Simulation studies demonstrate that NPT maintains a high level of agreement with the Wasserstein distance, while being at least 1000 times faster than its efficient variants when computing a 100-distribution pairwise distance matrix in both 2 and 5 dimensions. We illustrate the utility of NPT through a multidimensional scaling analysis of bivariate oxygen desaturation distributions of 723 individuals with sleep apnea in the Sleep Heart Health Study.
Problem

Research questions and friction points this paper is trying to address.

Wasserstein distance
multivariate distributions
distance computation
computational scalability
distributional data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonparanormal Transport
Wasserstein distance
multivariate distributions
computational efficiency
distributional data
🔎 Similar Papers
No similar papers found.
E
Edward Shao
Department of Biostatistics, University of Michigan
J
Junyoung Park
Department of Biostatistics, University of Michigan
N
Naresh Punjabi
Miller School of Medicine, University of Miami
Hui Jiang
Hui Jiang
Professor of Biostatistics, University of Michigan
BiostatisticsBioinformaticsStatisticsGenomicsStatistical Computing
Irina Gaynanova
Irina Gaynanova
University of Michigan
BiostatisticsStatistics