Introspective Diffusion Language Models

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Diffusion language models (DLMs) underperform autoregressive models due to a lack of introspective consistency—the inability to endorse their own generated content. This work proposes the Introspective Diffusion Language Model (I-DLM), which incorporates an introspective mechanism from autoregressive training via the Introspective Stepwise Decoding (ISD) algorithm, thereby significantly improving generation quality while preserving the parallelism advantage of diffusion models. I-DLM achieves performance on par with autoregressive counterparts at comparable model scales for the first time, outperforming existing diffusion models across 15 benchmarks—scoring 69.6 on AIME-24 and 45.7 on LiveCodeBench-v6—and attaining approximately three times the throughput of the previous best diffusion model. The study also introduces introspective acceptance rate as a novel evaluation metric.

Technology Category

Application Category

📝 Abstract

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model accepts its previously generated tokens. This reveals why AR training has a structural advantage: causal masking and logit shifting implicitly enforce introspective consistency. Motivated by this observation, we introduce Introspective Diffusion Language Model (I-DLM), a paradigm that retains diffusion-style parallel decoding while inheriting the introspective consistency of AR training. I-DLM uses a novel introspective strided decoding (ISD) algorithm, which enables the model to verify previously generated tokens while advancing new ones in the same forward pass. From a systems standpoint, we build I-DLM inference engine on AR-inherited optimizations and further customize it with a stationary-batch scheduler. To the best of our knowledge, I-DLM is the first DLM to match the quality of its same-scale AR counterpart while outperforming prior DLMs in both model quality and practical serving efficiency across 15 benchmarks. It reaches 69.6 on AIME-24 and 45.7 on LiveCodeBench-v6, exceeding LLaDA-2.1-mini (16B) by more than 26 and 15 points, respectively. Beyond quality, I-DLM is designed for the growing demand of large-concurrency serving, delivering about 3x higher throughput than prior state-of-the-art DLMs.

Problem

Research questions and friction points this paper is trying to address.

diffusion language models

autoregressive models

introspective consistency

generation quality

parallel decoding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introspective Consistency

Diffusion Language Models

Introspective Strided Decoding