DPad: Efficient Diffusion Language Models with Suffix Dropout

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the excessive computational overhead in parallel text generation with diffusion language models (dLLMs)—stemming from the need to predict all future suffix tokens—this paper proposes a training-free, efficient inference method. The core innovation lies in integrating sliding-window attention with a deterministic distance-decayed dropout mechanism, which explicitly restricts attention to distant suffix tokens and eliminates redundant computations. The method is fully compatible with existing optimizations such as prefix caching and requires only minimal code modifications for integration. Extensive experiments across multiple benchmark tasks and dLLMs of varying scales demonstrate that our approach achieves up to 61.4× inference speedup while preserving generation quality comparable to the original model.

Technology Category

Application Category

📝 Abstract

Diffusion-based Large Language Models (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only a small fraction. We propose Diffusion Scratchpad (DPad), a training-free method that restricts attention to a small set of nearby suffix tokens, preserving fidelity while eliminating redundancy. DPad integrates two strategies: (i) a sliding window, which maintains a fixed-length suffix window, and (ii) distance-decay dropout, which deterministically removes distant suffix tokens before attention computation. This simple design is compatible with existing optimizations such as prefix caching and can be implemented with only a few lines of code. Comprehensive evaluations across multiple benchmarks on LLaDA-1.5 and Dream models demonstrate that DPad delivers up to $mathbf{61.4 imes}$ speedup over vanilla dLLMs while maintaining comparable accuracy, highlighting its potential for efficient and scalable long-sequence inference. Our code is available at https://github.com/Crys-Chen/DPad.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational overhead in diffusion language models

Eliminates redundancy in suffix token prediction

Maintains accuracy while accelerating long-sequence inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Suffix dropout method reduces computational overhead

Sliding window restricts attention to nearby tokens

Distance-decay dropout removes distant tokens deterministically

🔎 Similar Papers

DiffuseDef: Improved Robustness to Adversarial Attacks