From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Diffusion language models (DLMs) suffer from low information efficiency in high-confidence token decoding, resulting in excessive parallel decoding steps and slow generation. Method: We propose a “bits-to-steps” theoretical framework that establishes, for the first time, a linear relationship between decoding steps and cumulative information gain. Leveraging this insight, we design a training-free exploration-exploitation decoding strategy featuring cross-block decoding, conditional distribution reshaping, and active exploration of high-uncertainty tokens—enabling uncertainty-guided cascaded prediction. Contribution/Results: Our method reduces decoding steps by 38% on average, significantly improving throughput while preserving generation quality comparable to standard decoding (BLEU/ROUGE degradation < 0.5). The core contribution is the establishment of an information-theoretic foundation for DLM decoding and the introduction of the first training-free, plug-and-play, information-maximization-based parallel decoding paradigm for DLMs.

Technology Category

Application Category

📝 Abstract

Diffusion Language Models (DLMs) have recently emerged as a strong alternative to autoregressive language models (LMs). DLMs offer comparable accuracy with faster inference speed via parallel decoding. However, standard DLM decoding strategies relying on high-confidence tokens encounter an inherent information-theoretic bottleneck that restricts decoding progress and ultimately slows generation. We demonstrate both theoretically and empirically that prioritizing high-confidence tokens is inherently inefficient. High-probability tokens carry negligible information and strictly relying on them limits the effective progress made in each decoding round. We prove that the number of decoding rounds must grow linearly with the sample's total information (negative log-likelihood) and inversely with the per-round information budget, establishing a bits-to-rounds principle. We also propose Explore-Then-Exploit (ETE), a training-free decoding strategy that maximizes information throughput and decoding efficiency. ETE combines cross-block decoding with targeted exploration of high-uncertainty tokens to reshape the conditional distribution and trigger cascades of confident predictions. Experiments verify our theoretical bounds and demonstrate that ETE consistently reduces the required number of decoding rounds compared to confidence-only baselines without compromising generation quality.

Problem

Research questions and friction points this paper is trying to address.

DLMs face information bottleneck from high-confidence token prioritization

Standard decoding strategies inefficiently limit per-round information progress

Explore-Then-Exploit strategy maximizes decoding efficiency while maintaining quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel decoding strategy for diffusion language models

Explore-Then-Exploit method combining cross-block decoding

Targeted exploration of high-uncertainty tokens

🔎 Similar Papers

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion