On Variance Reduction in Learning Mean Flows

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the instability in MeanFlow training, characterized by non-convergent loss and unbounded gradient variance, which stems from the conditional velocity field being improperly weighted in the loss function—simultaneously serving as both the regression target and the control variable for Jacobian-vector products. The study is the first to uncover this dual-role mechanism underlying the instability and, drawing upon stochastic control theory and Jacobian-vector product analysis, derives a closed-form expression for the optimal weighting coefficient. This principled correction not only unifies several recent empirical improvements but also demonstrates consistent gains: on 2D benchmarks, sample quality improves by up to 54%, and in latent diffusion Transformers (DiTs), it yields monotonically decreasing FID scores, revealing a quantifiable deviation between the FID-optimal coefficient and the one minimizing MSE.

📝 Abstract

One-step generative modeling has emerged as a leading approach to amortize the inference cost of diffusion and flow-matching models. Among distillation-free methods, MeanFlow training is notoriously unstable, with non-decreasing loss and unbounded gradient variance. In this work, we establish a theory that attributes this pathology to a misuse of the conditional velocity field: it plays two distinct statistical roles in the loss, both as an unbiased regression target and as a Monte Carlo control variate inside a Jacobi-vector product, with the original loss assigning the wrong coefficient to the latter. We derive the optimal coefficient in closed form, and show that a family of fixes in concurrent works corresponds to different practical realizations of the same optimum. A controlled sweep of this coefficient on two-dimensional benchmarks and on a latent Diffusion Transformer recovers the predicted bias-variance ordering. The optimal coefficient yields up to a %54 improvement in sample quality on two-dimensional benchmarks and a monotone FID trend at every matched-step DiT checkpoint. Crucially, the same DiT measurement also reveals a quantitative FID-MSE landscape mismatch: although gradient variance is minimized at an interior coefficient value, the coefficient that minimizes FID prefers the direct use of conditional velocity.

Problem

Research questions and friction points this paper is trying to address.

MeanFlow

variance reduction

generative modeling

gradient variance

one-step generative modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

variance reduction

MeanFlow

control variate