Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of training neural samplers to achieve full coverage of multimodal distributions without access to target samples. We propose the first principled framework optimizing the forward KL divergence. Its core innovation is importance-weighted score matching (IW-SM), which—combined with Monte Carlo importance sampling—enables unbiased gradient estimation and guaranteed mode coverage using only unnormalized density evaluations. Theoretical analysis characterizes the bias–variance trade-off in the estimator. Experiments on a 120-mode Gaussian mixture and a symmetric particle system demonstrate consistent superiority over state-of-the-art methods across all metrics: Wasserstein distance, maximum mean discrepancy (MMD), and coverage. Our approach effectively mitigates mode collapse—a well-known limitation of inverse KL-based methods—while requiring no ground-truth samples.

Technology Category

Application Category

📝 Abstract
Training neural samplers directly from unnormalized densities without access to target distribution samples presents a significant challenge. A critical desideratum in these settings is achieving comprehensive mode coverage, ensuring the sampler captures the full diversity of the target distribution. However, prevailing methods often circumvent the lack of target data by optimizing reverse KL-based objectives. Such objectives inherently exhibit mode-seeking behavior, potentially leading to incomplete representation of the underlying distribution. While alternative approaches strive for better mode coverage, they typically rely on implicit mechanisms like heuristics or iterative refinement. In this work, we propose a principled approach for training diffusion-based samplers by directly targeting an objective analogous to the forward KL divergence, which is conceptually known to encourage mode coverage. We introduce extit{Importance Weighted Score Matching}, a method that optimizes this desired mode-covering objective by re-weighting the score matching loss using tractable importance sampling estimates, thereby overcoming the absence of target distribution data. We also provide theoretical analysis of the bias and variance for our proposed Monte Carlo estimator and the practical loss function used in our method. Experiments on increasingly complex multi-modal distributions, including 2D Gaussian Mixture Models with up to 120 modes and challenging particle systems with inherent symmetries -- demonstrate that our approach consistently outperforms existing neural samplers across all distributional distance metrics, achieving state-of-the-art results on all benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Training neural samplers without target distribution samples
Achieving comprehensive mode coverage in distribution sampling
Overcoming mode-seeking bias in existing sampling methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Importance Weighted Score Matching for diffusion samplers
Optimizes forward KL divergence for mode coverage
Uses importance sampling to estimate loss weights
🔎 Similar Papers
No similar papers found.
C
Chenguang Wang
School of Data Science, The Chinese University of Hong Kong, Shenzhen; Shenzhen Research Institute of Big Data
X
Xiaoyu Zhang
The Academy of Mathematics and Systems Science, Chinese Academy of Sciences
K
Kaiyuan Cui
The Academy of Mathematics and Systems Science, Chinese Academy of Sciences
Weichen Zhao
Weichen Zhao
Nankai University
Y
Yongtao Guan
School of Data Science, The Chinese University of Hong Kong, Shenzhen
Tianshu Yu
Tianshu Yu
The Chinese University of Hong Kong, Shenzhen
Machine LearningOptimizationAI4Science