IDLM: Inverse-distilled Diffusion Language Models

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion language models suffer from low inference efficiency due to multi-step sampling, which hinders their practical deployment. This work presents the first extension of inverse distillation to discrete text generation, introducing a gradient-stabilized relaxation strategy to enable efficient training and establishing theoretical guarantees for the uniqueness of the inverse distillation solution in the discrete domain. The proposed method achieves 4× to 64× inference speedup across various diffusion language models while effectively preserving the teacher model’s generation entropy and perplexity, thereby substantially improving inference efficiency without compromising output quality.

Technology Category

Application Category

📝 Abstract
Diffusion Language Models (DLMs) have recently achieved strong results in text generation. However, their multi-step sampling leads to slow inference, limiting practical use. To address this, we extend Inverse Distillation, a technique originally developed to accelerate continuous diffusion models, to the discrete setting. Nonetheless, this extension introduces both theoretical and practical challenges. From a theoretical perspective, the inverse distillation objective lacks uniqueness guarantees, which may lead to suboptimal solutions. From a practical standpoint, backpropagation in the discrete space is non-trivial and often unstable. To overcome these challenges, we first provide a theoretical result demonstrating that our inverse formulation admits a unique solution, thereby ensuring valid optimization. We then introduce gradient-stable relaxations to support effective training. As a result, experiments on multiple DLMs show that our method, Inverse-distilled Diffusion Language Models (IDLM), reduces the number of inference steps by 4x-64x, while preserving the teacher model's entropy and generative perplexity.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Language Models
Inverse Distillation
Discrete Diffusion
Inference Acceleration
Text Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inverse Distillation
Diffusion Language Models
Discrete Diffusion
Gradient-stable Relaxation
Fast Inference
🔎 Similar Papers
No similar papers found.