🤖 AI Summary
Consistency Models (CMs) lack a posterior-guidance mechanism that operates without a diffusion teacher, making it challenging to flexibly balance generation fidelity and diversity. This work proposes Joint Flow Distribution Learning (JFDL), a lightweight, teacher-free approach that aligns the Gaussianity of implicit noise in the unconditional and conditional velocity fields of a pretrained CM. By doing so, JFDL enables tunable guidance for consistency models that originally support only conditional generation. Experiments on CIFAR-10 and ImageNet 64×64 demonstrate that the method significantly reduces FID scores, achieving generation quality comparable to Classifier-Free Guidance (CFG), thereby unlocking flexible and controllable synthesis capabilities in CMs.
📝 Abstract
Classifier-free Guidance (CFG) lets practitioners trade-off fidelity against diversity in Diffusion Models (DMs). The practicality of CFG is however hindered by DMs sampling cost. On the other hand, Consistency Models (CMs) generate images in one or a few steps, but existing guidance methods require knowledge distillation from a separate DM teacher, limiting CFG to Consistency Distillation (CD) methods. We propose Joint Flow Distribution Learning (JFDL), a lightweight alignment method enabling guidance in a pre-trained CM. With a pre-trained CM as an ordinary differential equation (ODE) solver, we verify with normality tests that the variance-exploding noise implied by the velocity fields from unconditional and conditional distributions is Gaussian. In practice, JFDL equips CMs with the familiar adjustable guidance knob, yielding guided images with similar characteristics to CFG. Applied to an original Consistency Trained (CT) CM that could only do conditional sampling, JFDL unlocks guided generation and reduces FID on both CIFAR-10 and ImageNet 64x64 datasets. This is the first time that CMs are able to receive effective guidance post-hoc without a DM teacher, thus, bridging a key gap in current methods for CMs.