Preconditioned Regularized Wasserstein Proximal Sampling

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses efficient sampling from Gibbs distributions using finite particle systems, tackling challenges including numerical instability, slow convergence, and degraded performance on non-log-concave targets. We propose a preconditioned noiseless sampling framework: computable score estimates are derived via regularized Wasserstein proximal operators; kernel-based update rules are obtained by combining the Cole–Hopf transformation with coupled anisotropic heat equations; and the diffusion term is interpreted as a modified self-attention mechanism—establishing, for the first time, a theoretical connection to Transformer architectures. We provide non-asymptotic convergence bounds and bias characterization for the discrete-time scheme. Experiments demonstrate substantial improvements in convergence speed and particle stability on both log-concave and non-log-concave distributions, with superior performance in Bayesian image deconvolution and non-convex neural network training.

Technology Category

Application Category

📝 Abstract
We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the score function with the numerically tractable score of a regularized Wasserstein proximal operator. This is derived by a Cole--Hopf transformation on coupled anisotropic heat equations, yielding a kernel formulation for the preconditioned regularized Wasserstein proximal. The diffusion component of the proposed method is also interpreted as a modified self-attention block, as in transformer architectures. For quadratic potentials, we provide a discrete-time non-asymptotic convergence analysis and explicitly characterize the bias, which is dependent on regularization and independent of step-size. Experiments demonstrate acceleration and particle-level stability on various log-concave and non-log-concave toy examples to Bayesian total-variation regularized image deconvolution, and competitive/better performance on non-convex Bayesian neural network training when utilizing variable preconditioning matrices.
Problem

Research questions and friction points this paper is trying to address.

Sampling from Gibbs distributions using particle evolution
Analyzing convergence and bias for quadratic potential cases
Evaluating performance on log-concave and non-log-concave examples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preconditioned regularized Wasserstein proximal sampling
Kernel formulation via Cole-Hopf transformation
Modified self-attention diffusion component
🔎 Similar Papers
No similar papers found.
Hong Ye Tan
Hong Ye Tan
Hedrick Assistant Adjunct Professor, UCLA
Machine LearningOptimizationInverse Problems
S
Stanley Osher
Department of Mathematics, University of California, Los Angeles, 90095
W
Wuchen Li
Department of Mathematics, University of South Carolina, Columbia, SC 29208