🤖 AI Summary
Diffusion models suffer from slow sampling due to multi-step denoising, and existing distillation methods—while accelerating inference—exacerbate covariate shift, leading to error accumulation and degraded sample quality. To address this, we propose Diffusion Distillation as Imitation Learning (DDIL), the first framework that formulates diffusion distillation as an imitation learning problem. DDIL mitigates covariate shift via bidirectional distribution alignment: forward alignment enforces matching of marginal data distributions, while backward alignment corrects student-model bias. Furthermore, we adopt the reflective diffusion formulation to enhance training stability and distillation efficiency. By jointly optimizing both forward and backward training distributions, DDIL improves generative diversity and fidelity while significantly boosting image quality under few-step sampling (e.g., 4–8 steps). Extensive experiments demonstrate that DDIL outperforms state-of-the-art methods—including PD, LCM, and DMD2—on multiple benchmarks, with superior training stability and generalization.
📝 Abstract
Diffusion models excel at generative modeling (e.g., text-to-image) but sampling requires multiple denoising network passes, limiting practicality. Efforts such as progressive distillation or consistency distillation have shown promise by reducing the number of passes at the expense of quality of the generated samples. In this work we identify co-variate shift as one of reason for poor performance of multi-step distilled models from compounding error at inference time. To address co-variate shift, we formulate diffusion distillation within imitation learning (DDIL) framework and enhance training distribution for distilling diffusion models on both data distribution (forward diffusion) and student induced distributions (backward diffusion). Training on data distribution helps to diversify the generations by preserving marginal data distribution and training on student distribution addresses compounding error by correcting covariate shift. In addition, we adopt reflected diffusion formulation for distillation and demonstrate improved performance, stable training across different distillation methods. We show that DDIL consistency improves on baseline algorithms of progressive distillation (PD), Latent consistency models (LCM) and Distribution Matching Distillation (DMD2).