🤖 AI Summary
Existing discrete diffusion language models (e.g., masked diffusion) achieve reasonable performance but lack the ability to correct previously generated tokens, limiting output quality. To address this, we propose Generalized Interpolative Discrete Diffusion (GIDD), the first unified theoretical framework for interpolative discrete diffusion, along with a novel diffusion variational lower bound (ELBO). GIDD introduces a generalized interpolative noise schedule and a hybrid masking-uniform noise injection strategy, enabling controllable noise injection and self-correcting sampling—allowing dynamic error correction during sequence generation. Under identical computational budgets, GIDD achieves state-of-the-art performance in diffusion-based language modeling, significantly improving sample quality and textual coherence. The code and pretrained models are publicly released.
📝 Abstract
While state-of-the-art language models achieve impressive results through next-token prediction, they have inherent limitations such as the inability to revise already generated tokens. This has prompted exploration of alternative approaches such as discrete diffusion. However, masked diffusion, which has emerged as a popular choice due to its simplicity and effectiveness, reintroduces this inability to revise words. To overcome this, we generalize masked diffusion and derive the theoretical backbone of a family of general interpolating discrete diffusion (GIDD) processes offering greater flexibility in the design of the noising processes. Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. Our code and models are open-source: https://github.com/dvruette/gidd/