🤖 AI Summary
Existing diffusion-based language models typically rely on predefined fixed-length generation, lacking the flexibility to produce variable-length text. Current approaches to variable-length generation either require retraining or depend solely on local confidence scores, struggling to balance output quality with adaptability. This work proposes a training-free, Bayesian structured decoding framework that formulates variable-length text generation as a dynamic structural inference problem, jointly inferring extension length, chunk boundaries, and decoding schedules. By integrating uncertainty estimates with structural signals and employing a dynamic window expansion mechanism, the method achieves high-quality, flexible chunk-level generation while preserving textual coherence. Experiments demonstrate that the proposed approach significantly outperforms both fixed-length and existing variable-length baselines across multiple benchmarks, achieving notable improvements in both generation quality and length adaptability.
📝 Abstract
Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive models, primarily due to their ability to enable parallel decoding. Despite this advantage, most existing DLMs rely on a fixed generation length specified prior to decoding, which restricts their flexibility in real-world applications. While a few recent works attempt to support flexible-length generation, they typically suffer from notable limitations: some require costly retraining to accommodate variable-length outputs, while others depend solely on local confidence signals during decoding. Such local criteria fail to capture the evolving structure of the sequence, often resulting in suboptimal generation quality. In this paper, we propose a training-free, Bayesian structured decoding framework that formulates flexible-length generation as a dynamic structural inference problem. Our approach formulates flexible-length generation as a dynamic structural inference problem, jointly computing the expansion length, the block boundaries, and the decoding schedule. At each window expansion step, the method integrates local uncertainty with structural signals via a unified mechanism that supports dynamic structured generation, including both flexible block expansion and block organization, while maintaining coherence. Extensive experiments across multiple benchmarks demonstrate that our approach significantly improves generation quality and flexibility over existing fixed-length and flexible-length baselines. These results highlight the advantage of Bayesian structured decoding for diffusion language model, providing a principled and efficient solution for structured text generation.