Posterior Augmented Flow Matching

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

In high-dimensional image generation, flow matching often suffers from flow collapse and poor generalization due to sparse single-trajectory supervision and high gradient variance. This work proposes Posterior-Augmented Flow Matching (PAFM), which generalizes pointwise supervision to an approximate posterior expectation over multiple target completions at intermediate states. By employing importance sampling, PAFM yields an unbiased gradient estimator that substantially reduces variance and enhances training stability. The method unifies flow matching, approximate posterior factorization, and conditional generative modeling, achieving up to a 3.4-point improvement in FID50K on benchmarks such as ImageNet and CC12M. PAFM is compatible with diverse model architectures and scales while incurring negligible additional computational cost.

📝 Abstract

Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This under-constrained supervision can cause flow collapse, where the learned dynamics memorize specific source-target pairings, mapping diverse inputs to overly similar outputs, failing to generalize. We introduce Posterior-Augmented Flow Matching (PAFM), a theoretically grounded generalization of FM that replaces single-target supervision with an expectation over an approximate posterior of valid target completions for a given intermediate state and condition. PAFM factorizes this intractable posterior into (i) the likelihood of the intermediate under a hypothesized endpoint and (ii) the prior probability of that endpoint under the condition, and uses an importance sampling scheme to construct a mixture over multiple candidate targets. We prove that PAFM yields an unbiased estimator of the original FM objective while substantially reducing gradient variance during training by aggregating information from many plausible continuation trajectories per intermediate. Finally, we show that PAFM improves over FM by up to 3.4 FID50K across different model scales (SiT-B/2 and SiT-XL/2), different architectures (SiT and MMDiT), and in both class and text conditioned benchmarks (ImageNet and CC12M), with a negligible increase in the compute overhead. Code: https://github.com/gstoica27/PAFM.git.

Problem

Research questions and friction points this paper is trying to address.

flow collapse

high-dimensional images

sparse supervision

generalization

training signal variance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Posterior-Augmented Flow Matching

Flow Matching

gradient variance reduction