Diffusion Language Models Are Natively Length-Aware

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Diffusion language models typically employ a fixed context window and a predetermined number of denoising steps, leading to substantial redundant computation when generating short texts. This work presents the first observation that the latent prompt representations inherently encode output length information. Leveraging this insight, the authors propose a training-free, zero-shot mechanism that dynamically predicts the target sequence length prior to generation and correspondingly truncates the context window, thereby adaptively adjusting the number of denoising steps. Evaluated on four benchmarks—GSM8K, HumanEval, IfEval, and LongFormQA—the method significantly reduces FLOPs without statistically significant performance degradation, and even achieves notable improvements on two tasks.

Technology Category

Application Category

📝 Abstract

Unlike autoregressive language models, which terminate variable-length generation upon predicting an End-of-Sequence (EoS) token, Diffusion Language Models (DLMs) operate over a fixed maximum-length context window for a predetermined number of denoising steps. However, this process is independent of the required response length, resulting in computational waste for the majority of short responses common in reasoning and chat tasks. To address this problem, we conjecture that the latent prompt representation contains sufficient information to estimate the required output length. We provide empirical evidence for this phenomenon and propose a zero-shot mechanism to dynamically crop the context window before generation begins, leading to fewer diffusion steps and substantial computational savings. We evaluate our approach on four benchmarks with diverse tasks -- GSM8K (reasoning), HumanEval (code generation), IfEval (instruction following), and LongFormQA (question answering) -- revealing massive efficiency gains at minimal performance impact. We report significant reductions in FLOPs across all tasks, with no statistically significant performance degradation, and significant performance improvements in 2 out of 4 tasks.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Language Models

length-aware generation

computational efficiency

fixed context window

response length

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Language Models

Length-aware Generation

Zero-shot Length Estimation