🤖 AI Summary
This work addresses the degradation in generation quality and diversity commonly observed in discrete diffusion models when distilling them to fewer sampling steps. The authors propose Discrete Moment Matching Distillation (D-MMD), which successfully adapts the moment-matching principle—previously effective in continuous diffusion—to the discrete domain through a novel Discrete Maximum Mean Discrepancy (Discrete MMD) formulation. This approach enables efficient knowledge transfer while circumventing the mode collapse issues prevalent in existing distillation methods. Extensive experiments on both text and image datasets demonstrate that the distilled models not only preserve high fidelity and diversity despite significantly reduced sampling steps but also surpass the original teacher model in performance.
📝 Abstract
It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful.
Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.