On Variance Reduction in Learning Mean Flows

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
This work addresses the instability in MeanFlow training, characterized by non-convergent loss and unbounded gradient variance, which stems from the conditional velocity field being improperly weighted in the loss function—simultaneously serving as both the regression target and the control variable for Jacobian-vector products. The study is the first to uncover this dual-role mechanism underlying the instability and, drawing upon stochastic control theory and Jacobian-vector product analysis, derives a closed-form expression for the optimal weighting coefficient. This principled correction not only unifies several recent empirical improvements but also demonstrates consistent gains: on 2D benchmarks, sample quality improves by up to 54%, and in latent diffusion Transformers (DiTs), it yields monotonically decreasing FID scores, revealing a quantifiable deviation between the FID-optimal coefficient and the one minimizing MSE.
📝 Abstract
One-step generative modeling has emerged as a leading approach to amortize the inference cost of diffusion and flow-matching models. Among distillation-free methods, MeanFlow training is notoriously unstable, with non-decreasing loss and unbounded gradient variance. In this work, we establish a theory that attributes this pathology to a misuse of the conditional velocity field: it plays two distinct statistical roles in the loss, both as an unbiased regression target and as a Monte Carlo control variate inside a Jacobi-vector product, with the original loss assigning the wrong coefficient to the latter. We derive the optimal coefficient in closed form, and show that a family of fixes in concurrent works corresponds to different practical realizations of the same optimum. A controlled sweep of this coefficient on two-dimensional benchmarks and on a latent Diffusion Transformer recovers the predicted bias-variance ordering. The optimal coefficient yields up to a %54 improvement in sample quality on two-dimensional benchmarks and a monotone FID trend at every matched-step DiT checkpoint. Crucially, the same DiT measurement also reveals a quantitative FID-MSE landscape mismatch: although gradient variance is minimized at an interior coefficient value, the coefficient that minimizes FID prefers the direct use of conditional velocity.
Problem

Research questions and friction points this paper is trying to address.

MeanFlow
variance reduction
generative modeling
gradient variance
one-step generative modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

variance reduction
MeanFlow
control variate
conditional velocity field
generative modeling
🔎 Similar Papers
2024-02-09International Conference on Machine LearningCitations: 4