Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion

📅 2026-01-20
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing block diffusion language models, which sacrifice global bidirectional context when incorporating autoregressive priors, leading to insufficient macro-level coherence in generated text. To overcome this, the authors propose a “draft-and-refine” framework: an initial draft is rapidly generated using small-block semi-autoregressive diffusion, followed by a refinement stage employing a global diffusion process with a larger bidirectional receptive field to restore full contextual modeling. The core innovation lies in a “diffusion-within-diffusion” mechanism, integrating snapshot-based confidence-aware remasking and multi-scale training to effectively mitigate the irreversibility and myopia inherent in block diffusion. Evaluated on OpenWebText, the method achieves a generation perplexity of 21.9—down from 25.7—using only 26% of the baseline fine-tuning budget, substantially narrowing the performance gap with autoregressive models.

Technology Category

Application Category

📝 Abstract
One of the most compelling features of global discrete diffusion language models is their global bidirectional contextual capability. However, existing block-based diffusion studies tend to introduce autoregressive priors, which, while offering benefits, can cause models to lose this global coherence at the macro level. To regain global contextual understanding while preserving the advantages of the semi-autoregressive paradigm, we propose Diffusion in Diffusion, a'draft-then-refine'framework designed to overcome the irreversibility and myopia problems inherent in block diffusion models. Our approach first employs block diffusion to generate rapid drafts using small blocks, then refines these drafts through global bidirectional diffusion with a larger bidirectional receptive field. We utilize snapshot confidence remasking to identify the most critical tokens that require modification, and apply mix-scale training to expand the block diffusion model's global capabilities. Empirical results demonstrate that our approach sets a new benchmark for discrete diffusion models on the OpenWebText dataset. Using only 26% of the fine-tuning budget of baseline models, we reduce generative perplexity from 25.7 to 21.9, significantly narrowing the performance gap with autoregressive models.
Problem

Research questions and friction points this paper is trying to address.

global coherence
semi-autoregressive diffusion
block diffusion
bidirectional context
discrete diffusion language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion in Diffusion
semi-autoregressive diffusion
global coherence
bidirectional contextual modeling
snapshot confidence remasking
🔎 Similar Papers
No similar papers found.
L
Linrui Ma
Noah's Ark Lab, Huawei, Montreal, Canada & Beijing, China
Yufei Cui
Yufei Cui
McGill University, MILA
Medical AIRAGLLM AgentPredictive Uncertainty
K
Kai Han
Noah's Ark Lab, Huawei, Montreal, Canada & Beijing, China
Yunhe Wang
Yunhe Wang
Noah's Ark Lab, Huawei Technologies
Deep LearningLanguage ModelMachine LearningComputer Vision