🤖 AI Summary
This work addresses the high training variance and low sampling efficiency of continuous-time generative models, which arise from treating each time step independently. The authors propose Temporal Pairwise Consistency (TPC), a lightweight regularization mechanism that couples velocity predictions at paired time steps along the same trajectory, without altering the model architecture, probability path, or solver. This approach effectively reduces gradient variance while preserving the original flow-matching objective. Integrated with noise-augmented training and rectified flow techniques, TPC significantly improves sample quality and sampling efficiency across multiple resolutions of CIFAR-10 and ImageNet, achieving superior FID scores at equal or lower computational cost and maintaining compatibility with current state-of-the-art generative pipelines.
📝 Abstract
Continuous-time generative models, such as diffusion models, flow matching, and rectified flow, learn time-dependent vector fields but are typically trained with objectives that treat timesteps independently, leading to high estimator variance and inefficient sampling. Prior approaches mitigate this via explicit smoothness penalties, trajectory regularization, or modified probability paths and solvers. We introduce Temporal Pair Consistency (TPC), a lightweight variance-reduction principle that couples velocity predictions at paired timesteps along the same probability path, operating entirely at the estimator level without modifying the model architecture, probability path, or solver. We provide a theoretical analysis showing that TPC induces a quadratic, trajectory-coupled regularization that provably reduces gradient variance while preserving the underlying flow-matching objective. Instantiated within flow matching, TPC improves sample quality and efficiency across CIFAR-10 and ImageNet at multiple resolutions, achieving lower FID at identical or lower computational cost than prior methods, and extends seamlessly to modern SOTA-style pipelines with noise-augmented training, score-based denoising, and rectified flow.