🤖 AI Summary
This work addresses the semantic degradation of Flow Matching models when generating out-of-distribution or minority-class samples, a failure mode exacerbated by dataset bias. The authors formally introduce the concept of a “biased manifold” and reveal a trajectory-locking mechanism induced by conditional expectation smoothing. To counteract this, they propose an orthogonal semantic injection strategy that operates entirely at inference time—requiring no retraining, architectural modifications, or changes to random seeds. By perturbing the initial velocity field along directions orthogonal to the dominant data modes, the method prevents latent trajectories from collapsing toward majority patterns. Evaluated on GenEval, this approach successfully rescues 75% of prompts that originally failed under the baseline model, substantially improving generation fairness, robustness, and minority-class sample fidelity.
📝 Abstract
Flow Matching (FM) has recently emerged as a leading approach for high-fidelity visual generation, offering a robust continuous-time alternative to ordinary differential equation (ODE) based models. However, despite their success, FM models are highly sensitive to dataset biases, which cause severe semantic degradation when generating out-of-distribution or minority-class samples. In this paper, we provide a rigorous mathematical formalization of the ``Bias Manifold'' within the FM framework. We identify that this performance drop is driven by conditional expectation smoothing, a mechanism that inevitably leads to trajectory lock-in during inference. To resolve this, we introduce InjectFlow, a novel, training-free method by injecting orthogonal semantics during the initial velocity field computation, without requiring any changes to the random seeds. This design effectively prevents the latent drift toward majority modes while maintaining high generative quality. Extensive experiments demonstrate the effectiveness of our approach. Notably, on the GenEval dataset, InjectFlow successfully fixes 75% of the prompts that standard flow matching models fail to generate correctly. Ultimately, our theoretical analysis and algorithm provide a ready-to-use solution for building more fair and robust visual foundation models.