Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the issue of flip-flop oscillations in aggressive parallel decoding within diffusion-based language models, which often lead to context dilution and ineffective revisions, thereby degrading inference efficiency. To mitigate this, the authors propose COVER, a method that constructs dual attention views via KV cache overwriting to enable leave-one-out validation and stable draft generation within a single forward pass. This approach effectively prevents self-leakage while preserving contextual information. COVER further incorporates a context-preserving verification mechanism, stability-aware seed selection, and dynamic adjustment of the number of verified tokens, collectively suppressing flip-flop oscillations. Experimental results demonstrate that COVER significantly reduces redundant revisions, accelerates decoding speed, and maintains high generation quality.

Technology Category

Application Category

📝 Abstract
Parallel diffusion decoding can accelerate diffusion language model inference by unmasking multiple tokens per step, but aggressive parallelism often harms quality. Revocable decoding mitigates this by rechecking earlier tokens, yet we observe that existing verification schemes frequently trigger flip-flop oscillations, where tokens are remasked and later restored unchanged. This behaviour slows inference in two ways: remasking verified positions weakens the conditioning context for parallel drafting, and repeated remask cycles consume the revision budget with little net progress. We propose COVER (Cache Override Verification for Efficient Revision), which performs leave-one-out verification and stable drafting within a single forward pass. COVER constructs two attention views via KV cache override: selected seeds are masked for verification, while their cached key value states are injected for all other queries to preserve contextual information, with a closed form diagonal correction preventing self leakage at the seed positions. COVER further prioritises seeds using a stability aware score that balances uncertainty, downstream influence, and cache drift, and it adapts the number of verified seeds per step. Across benchmarks, COVER markedly reduces unnecessary revisions and yields faster decoding while preserving output quality.
Problem

Research questions and friction points this paper is trying to address.

revocable decoding
flip-flop oscillations
diffusion language models
parallel decoding
verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

revocable decoding
diffusion language models
KV cache override
context preservation
stable drafting
🔎 Similar Papers
No similar papers found.
Y
Yanzheng Xiang
King's College London, UK
L
Lan Wei
Imperial College London, UK
Y
Yizhen Yao
King's College London, UK
Q
Qinglin Zhu
King's College London, UK
H
Hanqi Yan
King's College London, UK
Chen Jin
Chen Jin
Associate Principal AI Scientist, Astrazeneca
Vision-Language Human-Machine InteractionsMultimodal ReasoningAlignment and Explainable AI for H
P
Philip Alexander Teare
Centre for AI, Data Science & Artificial Intelligence, BioPharmaceuticals R&D, AstraZeneca, UK
Dandan Zhang
Dandan Zhang
Imperial College London
RoboticsAI
Lin Gui
Lin Gui
Assistant Professor, King's College London
Natural Language ProcessingComputational Linguistic
A
Amrutha Saseendran
Centre for AI, Data Science & Artificial Intelligence, BioPharmaceuticals R&D, AstraZeneca, UK
Yulan He
Yulan He
Professor, King's College London; Turing AI Fellow
Natural Language ProcessingLarge Language ModelsAI for education and health