🤖 AI Summary
Existing conditional flow-based generative models employ generic unimodal noise priors, leading to unnecessarily long source-to-target mapping paths and inefficient sampling. To address this, we propose the **Conditional Centralized Prior (CCP)**: a conditional encoder maps textual or other prompts to modality-specific centers in data space, from which class-adaptive Gaussian priors are constructed—marking the first dynamic prior customization within the flow matching framework. CCP significantly accelerates training convergence and reduces sampling steps while achieving superior performance across FID, KID, and CLIP Score versus baselines, balancing generation quality and efficiency. Our core contribution lies in departing from conventional fixed-prior paradigms by integrating prior design into the conditional modeling process, thereby enhancing both the geometric plausibility and computational efficiency of flow-based generative models.
📝 Abstract
Flow-based generative models have recently shown impressive performance for conditional generation tasks, such as text-to-image generation. However, current methods transform a general unimodal noise distribution to a specific mode of the target data distribution. As such, every point in the initial source distribution can be mapped to every point in the target distribution, resulting in long average paths. To this end, in this work, we tap into a non-utilized property of conditional flow-based models: the ability to design a non-trivial prior distribution. Given an input condition, such as a text prompt, we first map it to a point lying in data space, representing an ``average"data point with the minimal average distance to all data points of the same conditional mode (e.g., class). We then utilize the flow matching formulation to map samples from a parametric distribution centered around this point to the conditional target distribution. Experimentally, our method significantly improves training times and generation efficiency (FID, KID and CLIP alignment scores) compared to baselines, producing high quality samples using fewer sampling steps.