🤖 AI Summary
Diffusion language models often suffer from decoding stagnation in block-wise parallel generation due to early, irreversible commitments that hinder error correction. To address this limitation, this work proposes Reversible Diffusion Decoding (RDD), the first framework enabling efficient reversible generation in diffusion language models. RDD caches internal model states to support backtracking without recomputation and integrates a stagnation detection mechanism with a confidence-guided remasking strategy to selectively reinitialize low-confidence tokens. By overcoming the constraints of traditional irreversible generation, RDD substantially improves generation quality and robustness while introducing only minimal computational overhead, outperforming existing baselines.
📝 Abstract
Diffusion language models enable parallel token generation through block-wise decoding, but their irreversible commitments can lead to stagnation, where the reverse diffusion process fails to make further progress under a suboptimal context.We propose Reversible Diffusion Decoding (RDD), a decoding framework that introduces reversibility into block-wise diffusion generation. RDD detects stagnation as a state-dependent failure of the reverse process and enables efficient backtracking to earlier blocks without recomputation via cached model states. To avoid repeated failure trajectories, RDD applies confidence-guided re-masking to selectively reinitialize uncertain tokens while preserving reliable context.This reversible formulation allows decoding to recover from early commitment errors while maintaining the parallel efficiency of diffusion-based generation. Experiments show that RDD improves generation robustness and quality over baselines with minimal computational overhead.