Gradient Variance Reveals Failure Modes in Flow-Based Generative Models

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Deterministic training of flow-based generative models suffers from excessively low gradient variance, causing pathological memorization—models rigidly reproduce training pairs rather than generalizing, even when source-target interpolation paths intersect. Method: We establish gradient variance as a key analytical metric for the first time, and integrate Gaussian optimal transport theory with ODE vector field modeling to prove that deterministic optimization converges to ill-conditioned memorized solutions. To break this degeneracy, we propose a noise injection strategy that introduces minimal stochasticity into training to decouple optimization dynamics. Contribution/Results: Experiments on CelebA demonstrate that vanishingly small noise suffices to substantially suppress memorization and restore faithful interpolation generalization. This work provides both a novel theoretical framework for interpreting flow model behavior and a practical, robust design principle grounded in gradient variance analysis.

Technology Category

Application Category

📝 Abstract

Rectified Flows learn ODE vector fields whose trajectories are straight between source and target distributions, enabling near one-step inference. We show that this straight-path objective conceals fundamental failure modes: under deterministic training, low gradient variance drives memorization of arbitrary training pairings, even when interpolant lines between pairs intersect. To analyze this mechanism, we study Gaussian-to-Gaussian transport and use the loss gradient variance across stochastic and deterministic regimes to characterize which vector fields optimization favors in each setting. We then show that, in a setting where all interpolating lines intersect, applying Rectified Flow yields the same specific pairings at inference as during training. More generally, we prove that a memorizing vector field exists even when training interpolants intersect, and that optimizing the straight-path objective converges to this ill-defined field. At inference, deterministic integration reproduces the exact training pairings. We validate our findings empirically on the CelebA dataset, confirming that deterministic interpolants induce memorization, while the injection of small noise restores generalization.

Problem

Research questions and friction points this paper is trying to address.

Rectified Flows conceal failure modes through straight-path objectives

Low gradient variance causes memorization of arbitrary training pairings

Deterministic training reproduces exact pairings instead of generalizing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rectified Flows learn straight-path ODE vector fields

Gradient variance analysis reveals memorization failure modes

Noise injection restores generalization in deterministic training

🔎 Similar Papers

No similar papers found.