🤖 AI Summary
This work addresses the high computational cost of Wasserstein distance in high dimensions and its inefficiency in handling non-Gaussian, skewed distributions. The authors propose a novel Non-normalized Probability Transport (NPT) metric that, for the first time, integrates a family of non-normal distributions into the optimal transport framework. By leveraging nonparametric transformations and a Gaussian coupling structure, they derive a closed-form distance formula. The method preserves the ability to model complex multivariate distributions while dramatically improving computational efficiency: in both 2D and 5D settings, NPT achieves near-perfect agreement with Wasserstein distance but is over 1,000 times faster. The approach is successfully applied to analyze differences in oxygen saturation distributions among patients with sleep apnea.
📝 Abstract
With the increasing availability of data objects in the form of probability distributions, there is a growing need for statistical methods tailored to distributional data. Distance measures, especially the pairwise distance matrix between data objects, provide the foundation for a wide range of modern data analysis methods, such as clustering, multidimensional scaling, and distance-based regression, among others. The Wasserstein distance is commonly used with distributional data due to its compelling optimal transport property. However, while the Wasserstein distance can be efficiently computed for univariate distributions, its application to multivariate distributions is limited due to high computational costs. To address these scalability issues, we introduce the Nonparanormal Transport (NPT) metric, a closed-form distance based on the flexible nonparanormal distribution family for modeling skewed and non-Gaussian multivariate data. Simulation studies demonstrate that NPT maintains a high level of agreement with the Wasserstein distance, while being at least 1000 times faster than its efficient variants when computing a 100-distribution pairwise distance matrix in both 2 and 5 dimensions. We illustrate the utility of NPT through a multidimensional scaling analysis of bivariate oxygen desaturation distributions of 723 individuals with sleep apnea in the Sleep Heart Health Study.