Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity

📅 2024-05-24

🏛️ Neural Information Processing Systems

📈 Citations: 34

✨ Influential: 3

career value

247K/year

🤖 AI Summary

Diffusion models suffer from high inference costs, scaling linearly—O(d) or O(T)—with data dimensionality or time steps, hindering practical deployment. To address this, we propose a block-parallel Picard iteration framework for accelerated sampling. We provide the first rigorous proof that this approach achieves sublinear time complexity—specifically, ~O(poly log d)—thereby breaking the fundamental linear bottleneck. Our method integrates theoretical analysis grounded in the generalized Girsanov theorem with a dual-path design compatible with both stochastic differential equations (SDEs) and probability flow ordinary differential equations (ODEs). This enables scalable, efficient parallelization across large GPU clusters. Experiments on scientific modeling and image generation tasks demonstrate substantial speedups in sampling efficiency—up to orders of magnitude—without compromising sample quality. The framework establishes a new paradigm for high-dimensional diffusion sampling, uniquely combining theoretical guarantees with engineering feasibility.

Technology Category

Application Category

📝 Abstract

Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~cite{shih2024parallel}, we propose to divide the sampling process into $mathcal{O}(1)$ blocks with parallelizable Picard iterations within each block. Rigorous theoretical analysis reveals that our algorithm achieves $widetilde{mathcal{O}}(mathrm{poly} log d)$ overall time complexity, marking the first implementation with provable sub-linear complexity w.r.t. the data dimension $d$. Our analysis is based on a generalized version of Girsanov's theorem and is compatible with both the SDE and probability flow ODE implementations. Our results shed light on the potential of fast and efficient sampling of high-dimensional data on fast-evolving modern large-memory GPU clusters.

Problem

Research questions and friction points this paper is trying to address.

Reducing high inference cost in diffusion models

Achieving sub-linear time complexity for sampling

Enabling efficient high-dimensional data generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel Picard iterations in blocks

Sub-linear time complexity via poly-log d

Compatible with SDE and ODE implementations

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training