Diffusion Language Models Are Natively Length-Aware

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion language models typically employ a fixed context window and a predetermined number of denoising steps, leading to substantial redundant computation when generating short texts. This work presents the first observation that the latent prompt representations inherently encode output length information. Leveraging this insight, the authors propose a training-free, zero-shot mechanism that dynamically predicts the target sequence length prior to generation and correspondingly truncates the context window, thereby adaptively adjusting the number of denoising steps. Evaluated on four benchmarks—GSM8K, HumanEval, IfEval, and LongFormQA—the method significantly reduces FLOPs without statistically significant performance degradation, and even achieves notable improvements on two tasks.

Technology Category

Application Category

📝 Abstract
Unlike autoregressive language models, which terminate variable-length generation upon predicting an End-of-Sequence (EoS) token, Diffusion Language Models (DLMs) operate over a fixed maximum-length context window for a predetermined number of denoising steps. However, this process is independent of the required response length, resulting in computational waste for the majority of short responses common in reasoning and chat tasks. To address this problem, we conjecture that the latent prompt representation contains sufficient information to estimate the required output length. We provide empirical evidence for this phenomenon and propose a zero-shot mechanism to dynamically crop the context window before generation begins, leading to fewer diffusion steps and substantial computational savings. We evaluate our approach on four benchmarks with diverse tasks -- GSM8K (reasoning), HumanEval (code generation), IfEval (instruction following), and LongFormQA (question answering) -- revealing massive efficiency gains at minimal performance impact. We report significant reductions in FLOPs across all tasks, with no statistically significant performance degradation, and significant performance improvements in 2 out of 4 tasks.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Language Models
length-aware generation
computational efficiency
fixed context window
response length
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Language Models
Length-aware Generation
Zero-shot Length Estimation
Dynamic Context Cropping
Computational Efficiency
🔎 Similar Papers
No similar papers found.
V
Vittorio Rossi
Department of Computing Sciences, Bocconi University, Milan, Italy
G
Giacomo Cirò
Department of Computing Sciences, Bocconi University, Milan, Italy
D
Davide Beltrame
Department of Computing Sciences, Bocconi University, Milan, Italy
L
Luca Gandolfi
Department of Computing Sciences, Bocconi University, Milan, Italy
Paul Röttger
Paul Röttger
Postdoctoral Researcher, Bocconi University
Large Language ModelsSafety and Societal Impacts of AI Systems
Dirk Hovy
Dirk Hovy
Bocconi University
Natural Language ProcessingMachine LearningComputational SociolinguisticsComputational Social ScienceEthics in NLP