Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the degradation in generation quality and diversity commonly observed in discrete diffusion models when distilling them to fewer sampling steps. The authors propose Discrete Moment Matching Distillation (D-MMD), which successfully adapts the moment-matching principle—previously effective in continuous diffusion—to the discrete domain through a novel Discrete Maximum Mean Discrepancy (Discrete MMD) formulation. This approach enables efficient knowledge transfer while circumventing the mode collapse issues prevalent in existing distillation methods. Extensive experiments on both text and image datasets demonstrate that the distilled models not only preserve high fidelity and diversity despite significantly reduced sampling steps but also surpass the original teacher model in performance.

Technology Category

Application Category

📝 Abstract

It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.

Problem

Research questions and friction points this paper is trying to address.

discrete diffusion models

knowledge distillation

sampling efficiency

generation quality

model diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete Diffusion Models

Model Distillation

Moment Matching