Learnable Sampler Distillation for Discrete Diffusion Models

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Discrete diffusion models (DDMs) suffer from slow sampling—requiring hundreds of steps—for discrete data generation (e.g., text, molecules), while coarse-grained stepping accelerates inference at the cost of severe quality degradation due to accumulated score estimation error and discretization distortion. To address this, we propose a knowledge-distillation-based learnable efficient sampler. Our method introduces learnable sampling coefficients and a non-uniform time scheduling mechanism, enabling the student sampler to dynamically approximate the teacher model’s intermediate score trajectories. We further design Latent Score Distillation (LSD) loss to enforce gradient alignment in the discrete space. Evaluated on text and molecular generation tasks, our sampler achieves superior performance in just 4–8 steps—outperforming state-of-the-art methods (e.g., DDIM, D3PM) that require over 100 steps—thereby delivering simultaneous breakthroughs in both sample quality and inference speed.

Technology Category

Application Category

📝 Abstract

Discrete diffusion models (DDMs) have shown powerful generation ability for discrete data modalities like text and molecules. However, their practical application is hindered by inefficient sampling, requiring a large number of sampling steps. Accelerating DDMs by using larger step sizes typically introduces significant problems in generation quality, as it amplifies the impact of both the compounding decoding error due to factorized predictions and discretization error from numerical approximations, leading to a significant decrease in sampling quality. To address these challenges, we propose learnable sampler distillation (LSD), a novel approach to train fast and high-fidelity samplers for DDMs. LSD employs a distillation approach where a student sampler with a few steps learns to align its intermediate score trajectory with that of a high-quality teacher sampler with numerous steps. This alignment is achieved by optimizing learnable sampler coefficients that adaptively adjust sampling dynamics. Additionally, we further propose LSD+, which also learns time schedules that allocate steps non-uniformly. Experiments across text generation, image generation, and synthetic tasks demonstrate that our proposed approaches outperform existing samplers for DDMs, achieving substantially higher sampling quality with significantly fewer sampling steps. Our code is available at href{https://github.com/feiyangfu/LSD}{https://github.com/feiyangfu/LSD}.

Problem

Research questions and friction points this paper is trying to address.

Accelerating discrete diffusion models while maintaining generation quality

Addressing compounding decoding errors from factorized predictions

Reducing discretization errors caused by numerical approximations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distillation trains student sampler with teacher trajectory

Learnable coefficients adaptively adjust sampling dynamics

Learned time schedules allocate steps non-uniformly

🔎 Similar Papers

No similar papers found.