🤖 AI Summary
Consistency models suffer from a fundamental inconsistency between training and distillation objectives in continuous time, arising from systematic bias in Monte Carlo estimation of the velocity field—leading to slow convergence and sampling inconsistency. To address this, we propose Generator-Augmented Flow (GAF), the first framework to rigorously bridge continuous-time consistency training and distillation via theoretical analysis. GAF explicitly models the transport path from noisy data to target outputs by jointly optimizing a continuous-time flow, a consistency mapping, and a neural generator. We prove that GAF reduces both the training–distillation discrepancy and the optimal transport cost. Experiments demonstrate that GAF significantly accelerates convergence and improves both FID scores and sampling consistency on benchmarks including ImageNet, consistently outperforming state-of-the-art methods.
📝 Abstract
Consistency models imitate the multi-step sampling of score-based diffusion in a single forward pass of a neural network. They can be learned in two ways: consistency distillation and consistency training. The former relies on the true velocity field of the corresponding differential equation, approximated by a pre-trained neural network. In contrast, the latter uses a single-sample Monte Carlo estimate of this velocity field. The related estimation error induces a discrepancy between consistency distillation and training that, we show, still holds in the continuous-time limit. To alleviate this issue, we propose a novel flow that transports noisy data towards their corresponding outputs derived from a consistency model. We prove that this flow reduces the previously identified discrepancy and the noise-data transport cost. Consequently, our method not only accelerates consistency training convergence but also enhances its overall performance. The code is available at: https://github.com/thibautissenhuth/consistency_GC.