Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Diffusion language models suffer from semantic drift, repetition, and incoherence in long-text generation due to contextual decay induced by large decoding windows. To address this, we propose a convolutional normalized decoding mechanism that captures long-range dependencies via localized receptive fields—enabling effective context compression without chunking—and a rejection-based rule fine-tuning strategy that imposes explicit semantic consistency constraints during post-training. Together, these components enhance contextual fidelity and fluency in distant-token generation. Experiments demonstrate state-of-the-art performance on open-generation benchmarks (e.g., AlpacaEval), with a 37% reduction in generation steps, 2.1× inference speedup, and significant improvements in coherence and relevance.

Technology Category

Application Category

📝 Abstract

Autoregressive (AR) language models generate text one token at a time, which limits their inference speed. Diffusion-based language models offer a promising alternative, as they can decode multiple tokens in parallel. However, we identify a key bottleneck in current diffusion LMs: the long decoding-window problem, where tokens generated far from the input context often become irrelevant or repetitive. Previous solutions like semi-autoregressive address this issue by splitting windows into blocks, but this sacrifices speed and bidirectionality, eliminating the main advantage of diffusion models. To overcome this, we propose Convolutional decoding (Conv), a normalization-based method that narrows the decoding window without hard segmentation, leading to better fluency and flexibility. Additionally, we introduce Rejecting Rule-based Fine-Tuning (R2FT), a post-hoc training scheme that better aligns tokens at positions far from context. Our methods achieve state-of-the-art results on open-ended generation benchmarks (e.g., AlpacaEval) among diffusion LM baselines, with significantly lower step size than previous works, demonstrating both speed and quality improvements.

Problem

Research questions and friction points this paper is trying to address.

Addressing long decoding-window problem in diffusion language models

Improving fluency and relevance of distantly generated tokens

Maintaining parallel decoding speed while enhancing text quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Convolutional decoding for window narrowing

Rejective fine-tuning for token alignment

Parallel token generation via diffusion models

🔎 Similar Papers

No similar papers found.

Authors to Follow