Distillation of Discrete Diffusion through Dimensional Correlations

📅 2024-10-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Discrete diffusion models suffer from slow sampling and difficulty in modeling high-dimensional interdependencies—such as pixel-wise spatial relationships or token-level sequential dependencies. To address these challenges, this work proposes a scalable hybrid modeling framework and a novel dimension-aware distillation loss. It is the first to explicitly incorporate dimension-specific dependency modeling into the discrete diffusion distillation pipeline. Theoretically, we characterize the fundamental trade-off between element-wise independence approximations and multi-step sampling, and resolve it via a stride-aware knowledge transfer loss for efficient compression. By integrating hybrid probabilistic modeling with high-dimensional joint distribution approximation, our method compresses pre-trained discrete diffusion models from 100 sampling steps to just 5, while preserving generation quality on both image and language tasks—and accelerating inference by over 20×.

Technology Category

Application Category

📝 Abstract
Diffusion models have demonstrated exceptional performances in various fields of generative modeling, but suffer from slow sampling speed due to their iterative nature. While this issue is being addressed in continuous domains, discrete diffusion models face unique challenges, particularly in capturing dependencies between elements (e.g., pixel relationships in image, sequential dependencies in language) mainly due to the computational cost of processing high-dimensional joint distributions. In this paper, (i) we propose"mixture"models for discrete diffusion that are capable of treating dimensional correlations while remaining scalable, and (ii) we provide a set of loss functions for distilling the iterations of existing models. Two primary theoretical insights underpin our approach: First, conventional models with element-wise independence can well approximate the data distribution, but essentially require many sampling steps. Second, our loss functions enable the mixture models to distill such many-step conventional models into just a few steps by learning the dimensional correlations. Our experimental results show the effectiveness of the proposed method in distilling pretrained discrete diffusion models across image and language domains.
Problem

Research questions and friction points this paper is trying to address.

Discrete Diffusion Models
Sampling Speed
High-Dimensional Distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

mixture models
dimensional correlations
loss functions
🔎 Similar Papers
No similar papers found.