FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation--Full Version

πŸ“… 2026-04-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the significant approximation errors in continuous diffusion language models under few-step sampling, which arise from inaccurate self-conditioning signals and degrade generation quality. To mitigate this issue, the authors propose a novel training framework that explicitly models self-conditioning errors during training by perturbing the self-conditioning signal to align with inference-time noise levels. Additionally, a token-level noise-aware mechanism is introduced to enhance the model’s robustness to prior estimation errors and alleviate training saturation. Evaluated across multiple conditional generation benchmarks, the proposed method outperforms standard continuous diffusion models and existing single-step diffusion approaches, achieving up to a 400Γ— speedup in inference while maintaining superior generation performance.
πŸ“ Abstract
Self-conditioning has been central to the success of continuous diffusion language models, as it allows models to correct previous errors. Yet its ability degrades precisely in the regime where diffusion is most attractive for deployment: few-step sampling for fast inference. In this study, we show that when models only have a few denoising steps, inaccurate self-conditioning induces a substantial approximation gap; this mistake compounds across denoising steps and ultimately dominate the sample quality. To address this, we propose a novel training framework that handles these errors during learning by perturbing the self-conditioning signal to match inference noise, improving robustness to prior estimation errors. In addition, we introduce a token-level noise-awareness mechanism that prevents training from saturation, hence improving optimization. Extensive experiments across conditional generation benchmarks demonstrate that our framework surpasses standard continuous diffusion models while providing up to 400x faster inference speed, and remains competitive against other one-step diffusion frameworks.
Problem

Research questions and friction points this paper is trying to address.

self-conditioning
few-step sampling
diffusion language model
approximation gap
inference noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-conditioning
few-step diffusion
noise-awareness
fast inference
sequence-to-sequence generation
πŸ”Ž Similar Papers
No similar papers found.
D
Dat Nguyen-Cong
FPT Software AI Center, FPT Corporation
Tung Kieu
Tung Kieu
Aalborg University, Department of Computer Science
Data MiningData ManagementSpatio-Temporal DataTime Series Analysis
H
Hoang Thanh-Tung
Quantum AI and Cyber Security Institute, FPT Corporation