๐ค AI Summary
Linear modeling of high-dimensional probability measures remains challenging due to the non-Euclidean geometry of the 2-Wasserstein space.
Method: We propose the Linear Barycentric Coding Model (LBCM), grounded in the Linear Optimal Transport (LOT) metric, and integrate LOT with variational analysis and probability measure embedding to design efficient covariance estimation and missing-data imputation algorithms.
Contribution/Results: We establish, for the first time, the closed-form solution of LBCM and prove its equivalence to the 2-Wasserstein barycenter under compatible measures. Theoretically, we show that LBCM achieves exact representation in one dimension, identify intrinsic bottlenecks to its high-dimensional generalization, and derive finite-sample computability guarantees and generalization error bounds. Empirically, LBCM demonstrates superior accuracy and efficiency on measure synthesis, covariance modeling, and data imputation tasksโproviding the first framework for linearized analysis of probability measures with both theoretical rigor and practical applicability.
๐ Abstract
We propose the linear barycentric coding model (LBCM) which utilizes the linear optimal transport (LOT) metric for analysis and synthesis of probability measures. We provide a closed-form solution to the variational problem characterizing the probability measures in the LBCM and establish equivalence of the LBCM to the set of 2-Wasserstein barycenters in the special case of compatible measures. Computational methods for synthesizing and analyzing measures in the LBCM are developed with finite sample guarantees. One of our main theoretical contributions is to identify an LBCM, expressed in terms of a simple family, which is sufficient to express all probability measures on the closed unit interval. We show that a natural analogous construction of an LBCM in 2 dimensions fails, and we leave it as an open problem to identify the proper extension in more than 1 dimension. We conclude by demonstrating the utility of LBCM for covariance estimation and data imputation.