Efficient Distillation of Classifier-Free Guidance using Adapters

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency of classifier-free guidance (CFG) in conditional diffusion models—specifically, the doubling of function evaluations (NFEs) during inference—this paper proposes a lightweight CFG distillation framework. It freezes a pre-trained base model and fine-tunes only an adapter comprising approximately 2% of the total parameters. Crucially, it introduces single-step forward distillation directly on *real* CFG-guided trajectories, eliminating training-inference inconsistency. The distilled adapter is checkpoint-agnostic and plug-and-play across different model checkpoints. Experiments demonstrate that the method matches or surpasses standard CFG in FID score while reducing NFEs by 50% and doubling sampling speed. Notably, distillation of a 2.6B-parameter model completes on a single 24GB consumer-grade GPU, significantly enhancing the deployment efficiency and practicality of CFG.

Technology Category

Application Category

📝 Abstract
While classifier-free guidance (CFG) is essential for conditional diffusion models, it doubles the number of neural function evaluations (NFEs) per inference step. To mitigate this inefficiency, we introduce adapter guidance distillation (AGD), a novel approach that simulates CFG in a single forward pass. AGD leverages lightweight adapters to approximate CFG, effectively doubling the sampling speed while maintaining or even improving sample quality. Unlike prior guidance distillation methods that tune the entire model, AGD keeps the base model frozen and only trains minimal additional parameters ($sim$2%) to significantly reduce the resource requirement of the distillation phase. Additionally, this approach preserves the original model weights and enables the adapters to be seamlessly combined with other checkpoints derived from the same base model. We also address a key mismatch between training and inference in existing guidance distillation methods by training on CFG-guided trajectories instead of standard diffusion trajectories. Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models ($sim$2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational cost of classifier-free guidance
Maintains sample quality with fewer neural evaluations
Enables distillation on consumer hardware with minimal parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapter guidance distillation for CFG simulation
Lightweight adapters double sampling speed
Frozen base model with minimal parameter training
🔎 Similar Papers
No similar papers found.