🤖 AI Summary
Diffusion language models (DLMs) suffer from weak likelihood modeling and fixed-length sequence generation, despite advantages in parallel decoding and controllability. To address these limitations, we propose Block-Diffusion LM—a novel discrete diffusion framework that integrates autoregressive modeling principles for variable-length text generation and efficient parallel sampling. Our key contributions are: (1) an interpolatable block-wise denoising paradigm enabling arbitrary-length sequence modeling; (2) a data-driven adaptive noise scheduling strategy coupled with a gradient variance estimator, substantially improving training stability; and (3) KV-cache optimization and parallel token sampling, accelerating inference. Evaluated on standard language modeling benchmarks, Block-Diffusion LM establishes new state-of-the-art performance among diffusion-based LMs, achieving superior trade-offs among generation quality, inference efficiency, and controllability. The code, pretrained weights, and technical documentation are publicly released.
📝 Abstract
Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effective block diffusion models that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences. We provide the code, along with the model weights and blog post on the project page: https://m-arriola.com/bd3lms/