🤖 AI Summary
This work investigates error propagation in generative AI sampling from finite discrete datasets. Focusing on the two-stage sampling pipeline—comprising denoising score matching followed by Langevin diffusion—under Gaussian data assumptions, we derive the first exact analytical expression for the end-to-end Wasserstein sampling error. Our key contribution is a novel characterization of this error as the nuclear norm of the data’s power spectral density, which quantitatively uncovers the coupling among anisotropy, noise magnitude, step size, and sample size. By integrating score matching theory, stochastic differential equation analysis, Wasserstein distance theory, and spectral methods, we achieve fine-grained decomposition of the overall error into generalization error, score matching error, and diffusion error. The results provide a rigorous theoretical foundation and concrete optimization guidelines for noise- and step-size–adaptive sampling strategies.
📝 Abstract
Sampling from an unknown distribution, accessible only through discrete samples, is a fundamental problem at the core of generative AI. The current state-of-the-art methods follow a two-step process: first estimating the score function (the gradient of a smoothed log-distribution) and then applying a gradient-based sampling algorithm. The resulting distribution's correctness can be impacted by several factors: the generalization error due to a finite number of initial samples, the error in score matching, and the diffusion error introduced by the sampling algorithm. In this paper, we analyze the sampling process in a simple yet representative setting-sampling from Gaussian distributions using a Langevin diffusion sampler. We provide a sharp analysis of the Wasserstein sampling error that arises from the multiple sources of error throughout the pipeline. This allows us to rigorously track how the anisotropy of the data distribution (encoded by its power spectrum) interacts with key parameters of the end-to-end sampling method, including the noise amplitude, the step sizes in both score matching and diffusion, and the number of initial samples. Notably, we show that the Wasserstein sampling error can be expressed as a kernel-type norm of the data power spectrum, where the specific kernel depends on the method parameters. This result provides a foundation for further analysis of the tradeoffs involved in optimizing sampling accuracy, such as adapting the noise amplitude to the choice of step sizes.