🤖 AI Summary
Discrete diffusion models (DDMs) suffer from slow sampling—requiring hundreds of steps—for discrete data generation (e.g., text, molecules), while coarse-grained stepping accelerates inference at the cost of severe quality degradation due to accumulated score estimation error and discretization distortion. To address this, we propose a knowledge-distillation-based learnable efficient sampler. Our method introduces learnable sampling coefficients and a non-uniform time scheduling mechanism, enabling the student sampler to dynamically approximate the teacher model’s intermediate score trajectories. We further design Latent Score Distillation (LSD) loss to enforce gradient alignment in the discrete space. Evaluated on text and molecular generation tasks, our sampler achieves superior performance in just 4–8 steps—outperforming state-of-the-art methods (e.g., DDIM, D3PM) that require over 100 steps—thereby delivering simultaneous breakthroughs in both sample quality and inference speed.
📝 Abstract
Discrete diffusion models (DDMs) have shown powerful generation ability for discrete data modalities like text and molecules. However, their practical application is hindered by inefficient sampling, requiring a large number of sampling steps. Accelerating DDMs by using larger step sizes typically introduces significant problems in generation quality, as it amplifies the impact of both the compounding decoding error due to factorized predictions and discretization error from numerical approximations, leading to a significant decrease in sampling quality. To address these challenges, we propose learnable sampler distillation (LSD), a novel approach to train fast and high-fidelity samplers for DDMs. LSD employs a distillation approach where a student sampler with a few steps learns to align its intermediate score trajectory with that of a high-quality teacher sampler with numerous steps. This alignment is achieved by optimizing learnable sampler coefficients that adaptively adjust sampling dynamics. Additionally, we further propose LSD+, which also learns time schedules that allocate steps non-uniformly. Experiments across text generation, image generation, and synthetic tasks demonstrate that our proposed approaches outperform existing samplers for DDMs, achieving substantially higher sampling quality with significantly fewer sampling steps. Our code is available at href{https://github.com/feiyangfu/LSD}{https://github.com/feiyangfu/LSD}.