🤖 AI Summary
This paper addresses distribution matrix completion under sparse observations—i.e., recovering the full matrix of true probability distributions given only partially observed entries, each being an empirical distribution rather than a scalar. To this end, we generalize the nearest-neighbor method to the Wasserstein space and propose a novel optimal transport-based framework that jointly integrates Wasserstein barycenters, latent factor modeling, and distribution-valued nearest-neighbor estimation. Theoretically, we establish exact recovery guarantees under the Wasserstein norm. Empirically, our approach significantly improves pointwise distribution estimation accuracy, faithfully reconstructs key statistical functionals—including standard deviation and Value-at-Risk—and naturally captures heteroscedastic noise structure.
📝 Abstract
We introduce the problem of distributional matrix completion: Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries. This is a generalization of traditional matrix completion where the observations per matrix entry are scalar valued. To do so, we utilize tools from optimal transport to generalize the nearest neighbors method to the distributional setting. Under a suitable latent factor model on probability distributions, we establish that our method recovers the distributions in the Wasserstein norm. We demonstrate through simulations that our method is able to (i) provide better distributional estimates for an entry compared to using observed samples for that entry alone, (ii) yield accurate estimates of distributional quantities such as standard deviation and value-at-risk, and (iii) inherently support heteroscedastic noise. We also prove novel asymptotic results for Wasserstein barycenters over one-dimensional distributions.