Score Matching With Missing Data

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the lack of theoretical and algorithmic foundations for score matching under arbitrary random missingness—where any subset of coordinates may be missing. We propose the first unified framework for score matching under general missing-data mechanisms, introducing two novel paradigms: Importance-Weighted Score Matching (IW-SM), emphasizing robustness in low-dimensional, small-sample settings; and Variational Score Matching (VI-SM), prioritizing accuracy in high-dimensional, complex tasks. Theoretically, we establish finite-sample consistency guarantees for both estimators. Empirically, IW-SM significantly outperforms existing baselines in low-dimensional graphical model estimation, while VI-SM achieves state-of-the-art performance on high-dimensional real and synthetic datasets with missing values. Our framework systematically extends the applicability and practical utility of score matching to incomplete data scenarios, enabling principled density estimation without requiring imputation or restrictive missingness assumptions.

Technology Category

Application Category

📝 Abstract

Score matching is a vital tool for learning the distribution of data with applications across many areas including diffusion processes, energy based modelling, and graphical model estimation. Despite all these applications, little work explores its use when data is incomplete. We address this by adapting score matching (and its major extensions) to work with missing data in a flexible setting where data can be partially missing over any subset of the coordinates. We provide two separate score matching variations for general use, an importance weighting (IW) approach, and a variational approach. We provide finite sample bounds for our IW approach in finite domain settings and show it to have especially strong performance in small sample lower dimensional cases. Complementing this, we show our variational approach to be strongest in more complex high-dimensional settings which we demonstrate on graphical model estimation tasks on both real and simulated data.

Problem

Research questions and friction points this paper is trying to address.

Adapting score matching for incomplete data scenarios

Proposing two methods for missing data score matching

Evaluating performance in low and high-dimensional settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts score matching for missing data

Introduces importance weighting approach

Proposes variational approach for high-dimensions

🔎 Similar Papers

Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method