DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease

πŸ“… 2026-03-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Diffusion models suffer from high latency due to their iterative sampling process, hindering their applicability in interactive scenarios. This work introduces the draft-and-refine paradigm to diffusion models for the first time, proposing a parallel sampling framework that generates draft states for multiple future timesteps via skip-step sampling and concurrently computes their noise residuals, which are then integrated into the standard denoising trajectory. The method achieves theoretical speedups of up to 1/n or 2/(n+1) without requiring model retraining. Across various diffusion architectures, it delivers 1.4–3.7Γ— inference acceleration with negligible degradation in generation qualityβ€”FID and CLIP scores on MS-COCO remain nearly unchanged, while PickScore and HPSv2.1 decrease by only 0.17 and 0.43 on average, respectively.
πŸ“ Abstract
Diffusion models have achieved remarkable success in generating high-fidelity content but suffer from slow, iterative sampling, resulting in high latency that limits their use in interactive applications. We introduce DRiffusion, a parallel sampling framework that parallelizes diffusion inference through a draft-and-refine process. DRiffusion employs skip transitions to generate multiple draft states for future timesteps and computes their corresponding noises in parallel, which are then used in the standard denoising process to produce refined results. Theoretically, our method achieves an acceleration rate of $\tfrac{1}{n}$ or $\tfrac{2}{n+1}$, depending on whether the conservative or aggressive mode is used, where $n$ denotes the number of devices. Empirically, DRiffusion attains 1.4$\times$-3.7$\times$ speedup across multiple diffusion models while incur minimal degradation in generation quality: on MS-COCO dataset, both FID and CLIP remain largely on par with those of the original model, while PickScore and HPSv2.1 show only minor average drops of 0.17 and 0.43, respectively. These results verify that DRiffusion delivers substantial acceleration and preserves perceptual quality.
Problem

Research questions and friction points this paper is trying to address.

diffusion models
slow sampling
high latency
interactive applications
parallel inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models
parallel sampling
draft-and-refine
skip transitions
accelerated inference
πŸ”Ž Similar Papers
No similar papers found.
R
Runsheng Bai
CSAIL, MIT
Chengyu Zhang
Chengyu Zhang
Department of Computer Science, Loughborough University
Software EngineeringProgramming LanguagesFormal Methods
Y
Yangdong Deng
School of Software, Tsinghua University; FuturististAI Lab, Shanghai Tsinghua International Innovation Center