Stable Velocity: A Variance Perspective on Flow Matching

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses the high variance in training objectives of flow matching methods, which stems from reliance on single-sample conditional velocity estimates and leads to unstable optimization and slow convergence. To mitigate this, the authors propose Stable Velocity, a unified framework that reduces training noise through an unbiased variance-reduction objective, stabilizes optimization via an adaptive auxiliary supervision mechanism, and enables fast, tuning-free generation with a closed-form sampling strategy termed StableVS. By integrating VA-REPA representation alignment and the StableVM objective function, the method significantly improves training efficiency across ImageNet 256×256 and several large-scale models, achieving over 2× faster sampling while preserving generation quality.

Technology Category

Application Category

📝 Abstract

While flow matching is elegant, its reliance on single-sample conditional velocities leads to high-variance training targets that destabilize optimization and slow convergence. By explicitly characterizing this variance, we identify 1) a high-variance regime near the prior, where optimization is challenging, and 2) a low-variance regime near the data distribution, where conditional and marginal velocities nearly coincide. Leveraging this insight, we propose Stable Velocity, a unified framework that improves both training and sampling. For training, we introduce Stable Velocity Matching (StableVM), an unbiased variance-reduction objective, along with Variance-Aware Representation Alignment (VA-REPA), which adaptively strengthen auxiliary supervision in the low-variance regime. For inference, we show that dynamics in the low-variance regime admit closed-form simplifications, enabling Stable Velocity Sampling (StableVS), a finetuning-free acceleration. Extensive experiments on ImageNet $256\times256$ and large pretrained text-to-image and text-to-video models, including SD3.5, Flux, Qwen-Image, and Wan2.2, demonstrate consistent improvements in training efficiency and more than $2\times$ faster sampling within the low-variance regime without degrading sample quality. Our code is available at https://github.com/linYDTHU/StableVelocity.

Problem

Research questions and friction points this paper is trying to address.

flow matching

high-variance training

conditional velocity

optimization instability

convergence speed

Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching

variance reduction

Stable Velocity