Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the slow generation speed and training instability of discrete diffusion language models (dLLMs). We propose DiDi-Instruct, a novel method grounded in the integral KL divergence minimization framework. It innovatively integrates grouped reward normalization, intermediate state matching, and reward-guided ancestral sampling (RGAS), significantly improving both training efficiency and generation quality. On OpenWebText, DiDi-Instruct achieves sample perplexities of 18.4–62.2 using only 8–128 function evaluations (NFEs), outperforming GPT-2 and standard dLLMs; it accelerates generation by 64×, incurs negligible entropy loss, and reduces training time by 20×. Our core contribution is the first integration of KL divergence optimization with a multi-stage reward alignment mechanism—enabling ultra-fast, stable, and high-quality discrete text generation.

Technology Category

Application Category

📝 Abstract
Fast generation of language texts is the holy grail that people pursue in the AI era. In this work, we introduced Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that leads to fast language generation models by initializing from a pre-trained (masked) discrete diffusion language model (dLLM). The resulting DiDi-Instruct model outperforms the dLLM counterparts and the GPT-2 baseline with 64x acceleration. In the theoretical part of the paper, we build the foundation of DiDi-Instruct in a framework of integral KL-divergence minimization, with practical training algorithms. We also introduce techniques like grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler (RGAS) that significantly improve the training stability, the model coverage, and the inference performances. On OpenWebText, DiDi-Instruct outperforms all accelerated language generation models as well as the GPT-2 baseline and the standard dLLMs, achieving sample perplexities ranging from 62.2 (8 NFEs) to 18.4 (128 NFEs). These performance gains are accomplished with a negligible entropy loss of about 1% and 20x less additional training wall-clock time. We further validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling, and the generation of discrete protein sequences. In conclusion, DiDi-Instruct is an efficient yet effective distillation method, enabling language generation in the blink of an eye. We will release both code and models at github.com/haoyangzheng-ai/didi-instruct.
Problem

Research questions and friction points this paper is trying to address.

Accelerating language generation from discrete diffusion models
Improving training stability and inference performance simultaneously
Achieving faster text generation with minimal quality loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Initializes from pre-trained discrete diffusion language model
Uses integral KL-divergence minimization framework for training
Implements grouped reward normalization and reward-guided sampling
🔎 Similar Papers
No similar papers found.