🤖 AI Summary
This work addresses efficient sampling from Gibbs distributions using finite particle systems, tackling challenges including numerical instability, slow convergence, and degraded performance on non-log-concave targets. We propose a preconditioned noiseless sampling framework: computable score estimates are derived via regularized Wasserstein proximal operators; kernel-based update rules are obtained by combining the Cole–Hopf transformation with coupled anisotropic heat equations; and the diffusion term is interpreted as a modified self-attention mechanism—establishing, for the first time, a theoretical connection to Transformer architectures. We provide non-asymptotic convergence bounds and bias characterization for the discrete-time scheme. Experiments demonstrate substantial improvements in convergence speed and particle stability on both log-concave and non-log-concave distributions, with superior performance in Bayesian image deconvolution and non-convex neural network training.
📝 Abstract
We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the score function with the numerically tractable score of a regularized Wasserstein proximal operator. This is derived by a Cole--Hopf transformation on coupled anisotropic heat equations, yielding a kernel formulation for the preconditioned regularized Wasserstein proximal. The diffusion component of the proposed method is also interpreted as a modified self-attention block, as in transformer architectures. For quadratic potentials, we provide a discrete-time non-asymptotic convergence analysis and explicitly characterize the bias, which is dependent on regularization and independent of step-size. Experiments demonstrate acceleration and particle-level stability on various log-concave and non-log-concave toy examples to Bayesian total-variation regularized image deconvolution, and competitive/better performance on non-convex Bayesian neural network training when utilizing variable preconditioning matrices.