🤖 AI Summary
This work addresses a key limitation in existing decoding strategies for diffusion-based large language models, which overlook the heterogeneity of information density across contexts, thereby constraining generation quality. To remedy this, the authors propose FoCore, a training-free self-contrastive decoding strategy that identifies high-density (HD) tokens and temporarily re-masks them as negative samples to guide globally coherent generation. They further introduce an accelerated variant, FoCore_A, which, upon convergence of HD tokens, performs parallel decoding of stable candidates within a local window to substantially improve efficiency. Experimental results demonstrate that FoCore significantly outperforms baseline methods on mathematical, code, and logical reasoning tasks—e.g., improving HumanEval pass@1 from 39.02 to 42.68—while FoCore_A reduces decoding steps by a factor of 2.07 and decreases single-sample latency by 58.4%.
📝 Abstract
The iterative denoising paradigm of Diffusion Large Language Models (DLMs) endows them with a distinct advantage in global context modeling. However, current decoding strategies fail to leverage this capability, typically exhibiting a local preference that overlooks the heterogeneous information density within the context, ultimately degrading generation quality. To address this limitation, we systematically investigate high-information-density (HD) tokens and present two key findings: (1) explicitly conditioning on HD tokens substantially improves output quality; and (2) HD tokens exhibit an early-decoding tendency, converging earlier than surrounding tokens. Motivated by these findings, we propose Focus on the Core \textbf{(FoCore)}, a training-free decoding strategy that utilizes HD tokens in a self-contrast manner, wherein HD tokens are temporarily remasked as negative samples, to guide generation. We further introduce FoCore\_Accelerate \textbf{(FoCore\_A)}, an efficient variant that, upon detecting HD token convergence, performs parallel decoding over stable candidates within a local context window, substantially accelerating generation. Extensive experiments on math, code and logical reasoning benchmarks demonstrate that FoCore consistently improves generation quality and efficiency across both LLaDA and Dream backbones. For instance, on HumanEval, FoCore improves pass@1 from 39.02 to 42.68 over standard Classifier-Free Guidance, while FoCore-A reduces the number of decoding steps by 2.07x and per-sample latency from 20.76s to 8.64s (-58.4\%).