Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of diffusion model inference and the limited scalability of existing distributed parallelization methods, which often fail to achieve linear speedup across multiple GPUs and may introduce generation artifacts. The authors propose a novel hybrid parallel framework that partitions computation based on conditional and unconditional denoising paths. By integrating conditional guidance scheduling, a hybrid data-pipeline parallelism strategy, and an adaptive parallel mode switching mechanism, the approach simultaneously optimizes inference speed and generation fidelity on both U-Net and DiT architectures. Experiments demonstrate that, using two NVIDIA RTX 3090 GPUs, the method reduces inference latency by 2.31× for SDXL and 2.07× for SD3 while preserving image quality, significantly outperforming current acceleration techniques.

Technology Category

Application Category

📝 Abstract
Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism suffer from noticeable generation artifacts and fail to achieve substantial acceleration proportional to the number of GPUs. Therefore, we propose a hybrid parallelism framework that combines a novel data parallel strategy, condition-based partitioning, with an optimal pipeline scheduling method, adaptive parallelism switching, to reduce generation latency and achieve high generation quality in conditional diffusion models. The key ideas are to (i) leverage the conditional and unconditional denoising paths as a new data-partitioning perspective and (ii) adaptively enable optimal pipeline parallelism according to the denoising discrepancy between these two paths. Our framework achieves $2.31\times$ and $2.07\times$ latency reductions on SDXL and SD3, respectively, using two NVIDIA RTX~3090 GPUs, while preserving image quality. This result confirms the generality of our approach across U-Net-based diffusion models and DiT-based flow-matching architectures. Our approach also outperforms existing methods in acceleration under high-resolution synthesis settings. Code is available at https://github.com/kaist-dmlab/Hybridiff.
Problem

Research questions and friction points this paper is trying to address.

diffusion models
inference acceleration
distributed parallelism
generation artifacts
conditional guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid parallelism
conditional guidance scheduling
diffusion acceleration
adaptive pipeline scheduling
data partitioning
🔎 Similar Papers
No similar papers found.
E
Euisoo Jung
School of Computing, KAIST
B
Byunghyun Kim
School of Computing, KAIST
Hyunjin Kim
Hyunjin Kim
KAIST
Computer Vision
S
Seonghye Cho
School of Computing, KAIST
Jae-Gil Lee
Jae-Gil Lee
Professor, School of Computing, KAIST
big datadata mining