🤖 AI Summary
Consistency models (CMs) conventionally rely on pre-trained diffusion or flow models, limiting flexibility and increasing computational overhead.
Method: This paper proposes a novel pre-training-free flow-mapping self-distillation paradigm. It leverages the analytical equivalence between the time derivative of the flow mapping and the velocity field, transforming conventional two-stage distillation into end-to-end flow-mapping learning. The approach incorporates flow-mapping parameterization, higher-order derivative regularization, and continuous-time modeling to enable task-adaptive objective design.
Contribution/Results: Experiments demonstrate significant improvements in sampling efficiency and synthesis quality for high-dimensional tasks (e.g., image generation). For low-dimensional tasks, higher-order derivative constraints enhance detail fidelity, accelerate convergence, and improve inference stability. This work establishes the first fully pre-training-free, theoretically interpretable, efficient, and robust consistency modeling framework—bridging theoretical rigor with practical performance across diverse domains.
📝 Abstract
Building on the framework proposed in Boffi et al. (2024), we present a systematic approach for learning flow maps associated with flow and diffusion models. Flow map-based models, commonly known as consistency models, encompass recent efforts to improve the efficiency of generative models based on solutions to differential equations. By exploiting a relationship between the velocity field underlying a continuous-time flow and the instantaneous rate of change of the flow map, we show how to convert existing distillation schemes into direct training algorithms via self-distillation, eliminating the need for pre-trained models. We empirically evaluate several instantiations of our framework, finding that high-dimensional tasks like image synthesis benefit from objective functions that avoid temporal and spatial derivatives of the flow map, while lower-dimensional tasks can benefit from objectives incorporating higher-order derivatives to capture sharp features.