Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the lack of theoretical foundations in generative drifting methods by establishing, for the first time, an equivalence between generative drifting and score matching through spectral and variational perspectives. It interprets the drift operator under a Gaussian kernel as the difference of scores over a smoothed distribution and reveals the convergence mechanism via linearization of McKean–Vlasov dynamics and Fourier analysis. Key contributions include deriving the theoretical necessity of the stop-gradient operator from the JKO scheme, proposing an exponential bandwidth annealing strategy that reduces high-frequency convergence time from exponential to logarithmic, proving that zero drift is equivalent to distributional equality, clarifying why the Laplacian kernel outperforms the Gaussian kernel, and constructing a new drift operator based on Sinkhorn divergence to provide a rigorous variational foundation for stable training.

Technology Category

Application Category

📝 Abstract

Generative Modeling via Drifting has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet the success is largely empirical and its theoretical foundations remain poorly understood. In this paper, we make the following observation: \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This insight allows us to answer all three key questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the well-studied score-matching family and enable a rich theoretical perspective. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, explaining the empirical preference for the Laplacian kernel. We also propose an exponential bandwidth annealing schedule $σ(t)=σ_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is derived directly from the frozen-field discretization mandated by the JKO scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, demonstrated with a Sinkhorn divergence drift.

Problem

Research questions and friction points this paper is trying to address.

Generative Drifting

Score Matching

Kernel Selection

Stop-Gradient

Distribution Equality

Innovation

Methods, ideas, or system contributions that make the work stand out.

score matching

generative drifting

Wasserstein gradient flow