🤖 AI Summary
Diffusion models suffer from high inference costs, scaling linearly—O(d) or O(T)—with data dimensionality or time steps, hindering practical deployment. To address this, we propose a block-parallel Picard iteration framework for accelerated sampling. We provide the first rigorous proof that this approach achieves sublinear time complexity—specifically, ~O(poly log d)—thereby breaking the fundamental linear bottleneck. Our method integrates theoretical analysis grounded in the generalized Girsanov theorem with a dual-path design compatible with both stochastic differential equations (SDEs) and probability flow ordinary differential equations (ODEs). This enables scalable, efficient parallelization across large GPU clusters. Experiments on scientific modeling and image generation tasks demonstrate substantial speedups in sampling efficiency—up to orders of magnitude—without compromising sample quality. The framework establishes a new paradigm for high-dimensional diffusion sampling, uniquely combining theoretical guarantees with engineering feasibility.
📝 Abstract
Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique~cite{shih2024parallel}, we propose to divide the sampling process into $mathcal{O}(1)$ blocks with parallelizable Picard iterations within each block. Rigorous theoretical analysis reveals that our algorithm achieves $widetilde{mathcal{O}}(mathrm{poly} log d)$ overall time complexity, marking the first implementation with provable sub-linear complexity w.r.t. the data dimension $d$. Our analysis is based on a generalized version of Girsanov's theorem and is compatible with both the SDE and probability flow ODE implementations. Our results shed light on the potential of fast and efficient sampling of high-dimensional data on fast-evolving modern large-memory GPU clusters.