🤖 AI Summary
Consistency models (CMs) currently rely on knowledge distillation from pretrained diffusion models (DMs) to emulate classifier-free guidance (CFG), resulting in high computational overhead, limited flexibility, and saturation artifacts at high guidance scales. To address these limitations, we propose iGCT—the first fully data-driven, DM-free, and distillation-free framework for controllable CM training. Its core innovations are: (1) a reversible network architecture coupled with reversible guidance training, enabling end-to-end guided learning; and (2) a guidance-consistency loss that explicitly models conditional guidance dynamics to mitigate artifacts. Evaluated on CIFAR-10 and ImageNet64, iGCT achieves substantial improvements in FID and precision. At guidance scale 13, it attains a precision of 0.80—representing a 69% relative improvement over the baseline DM (0.47)—while significantly reducing both training and inference costs.
📝 Abstract
Guidance in image generation steers models towards higher-quality or more targeted outputs, typically achieved in Diffusion Models (DMs) via Classifier-free Guidance (CFG). However, recent Consistency Models (CMs), which offer fewer function evaluations, rely on distilling CFG knowledge from pretrained DMs to achieve guidance, making them costly and inflexible. In this work, we propose invertible Guided Consistency Training (iGCT), a novel training framework for guided CMs that is entirely data-driven. iGCT, as a pioneering work, contributes to fast and guided image generation and editing without requiring the training and distillation of DMs, greatly reducing the overall compute requirements. iGCT addresses the saturation artifacts seen in CFG under high guidance scales. Our extensive experiments on CIFAR-10 and ImageNet64 show that iGCT significantly improves FID and precision compared to CFG. At a guidance of 13, iGCT improves precision to 0.8, while DM's drops to 0.47. Our work takes the first step toward enabling guidance and inversion for CMs without relying on DMs.