Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space

📅 2024-10-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This paper addresses distribution matrix completion under sparse observations—i.e., recovering the full matrix of true probability distributions given only partially observed entries, each being an empirical distribution rather than a scalar. To this end, we generalize the nearest-neighbor method to the Wasserstein space and propose a novel optimal transport-based framework that jointly integrates Wasserstein barycenters, latent factor modeling, and distribution-valued nearest-neighbor estimation. Theoretically, we establish exact recovery guarantees under the Wasserstein norm. Empirically, our approach significantly improves pointwise distribution estimation accuracy, faithfully reconstructs key statistical functionals—including standard deviation and Value-at-Risk—and naturally captures heteroscedastic noise structure.

Technology Category

Application Category

📝 Abstract

We introduce the problem of distributional matrix completion: Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries. This is a generalization of traditional matrix completion where the observations per matrix entry are scalar valued. To do so, we utilize tools from optimal transport to generalize the nearest neighbors method to the distributional setting. Under a suitable latent factor model on probability distributions, we establish that our method recovers the distributions in the Wasserstein norm. We demonstrate through simulations that our method is able to (i) provide better distributional estimates for an entry compared to using observed samples for that entry alone, (ii) yield accurate estimates of distributional quantities such as standard deviation and value-at-risk, and (iii) inherently support heteroscedastic noise. We also prove novel asymptotic results for Wasserstein barycenters over one-dimensional distributions.

Problem

Research questions and friction points this paper is trying to address.

Impute true distributions from sparse matrix of empirical distributions

Generalize nearest neighbors method using optimal transport tools

Recover distributions accurately under latent factor model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses nearest neighbors in Wasserstein space

Generalizes matrix completion with optimal transport

Recovers distributions under latent factor model

🔎 Similar Papers

Randomized Approach to Matrix Completion: Applications in Collaborative Filtering and Image Inpainting