Distillation of Discrete Diffusion through Dimensional Correlations

📅 2024-10-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Discrete diffusion models suffer from slow sampling and difficulty in modeling high-dimensional interdependencies—such as pixel-wise spatial relationships or token-level sequential dependencies. To address these challenges, this work proposes a scalable hybrid modeling framework and a novel dimension-aware distillation loss. It is the first to explicitly incorporate dimension-specific dependency modeling into the discrete diffusion distillation pipeline. Theoretically, we characterize the fundamental trade-off between element-wise independence approximations and multi-step sampling, and resolve it via a stride-aware knowledge transfer loss for efficient compression. By integrating hybrid probabilistic modeling with high-dimensional joint distribution approximation, our method compresses pre-trained discrete diffusion models from 100 sampling steps to just 5, while preserving generation quality on both image and language tasks—and accelerating inference by over 20×.

Technology Category

Application Category

📝 Abstract

Diffusion models have demonstrated exceptional performances in various fields of generative modeling, but suffer from slow sampling speed due to their iterative nature. While this issue is being addressed in continuous domains, discrete diffusion models face unique challenges, particularly in capturing dependencies between elements (e.g., pixel relationships in image, sequential dependencies in language) mainly due to the computational cost of processing high-dimensional joint distributions. In this paper, (i) we propose"mixture"models for discrete diffusion that are capable of treating dimensional correlations while remaining scalable, and (ii) we provide a set of loss functions for distilling the iterations of existing models. Two primary theoretical insights underpin our approach: First, conventional models with element-wise independence can well approximate the data distribution, but essentially require many sampling steps. Second, our loss functions enable the mixture models to distill such many-step conventional models into just a few steps by learning the dimensional correlations. Our experimental results show the effectiveness of the proposed method in distilling pretrained discrete diffusion models across image and language domains.

Problem

Research questions and friction points this paper is trying to address.

Discrete Diffusion Models

Sampling Speed

High-Dimensional Distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

mixture models

dimensional correlations

loss functions

🔎 Similar Papers

No similar papers found.