🤖 AI Summary
Diffusion models produce high-fidelity samples but suffer from low sampling efficiency. This paper introduces TADA, a training-free diffusion sampling acceleration framework. Methodologically, TADA (1) proposes a novel high-dimensional initial noise mechanism enabling controllable fine-detail generation in few steps; (2) establishes theoretical equivalence between momentum-based diffusion models and standard diffusion models, naturally incorporating stochastic differential equation (SDE) properties; and (3) constructs a training-free sampling framework based on ordinary differential equation (ODE) solvers, compatible with both pixel- and latent-space models, as well as class- and text-conditioned architectures. Evaluated on ImageNet512, TADA significantly outperforms existing state-of-the-art samplers. It demonstrates strong generalization and robustness across diverse foundation models—including EDM, EDM2, and Stable Diffusion 3—with up to 186% speedup while preserving Fréchet Inception Distance (FID) performance.
📝 Abstract
Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is up to $186%$ faster than the current state of the art solver for comparative FID on ImageNet512. This new sampling method is training-free and uses an ordinary differential equation (ODE) solver. The key to our method resides in using higher-dimensional initial noise, allowing to produce more detailed samples with less function evaluations from existing pretrained diffusion models. In addition, by design our solver allows to control the level of detail through a simple hyper-parameter at no extra computational cost. We present how our approach leverages momentum dynamics by establishing a fundamental equivalence between momentum diffusion models and conventional diffusion models with respect to their training paradigms. Moreover, we observe the use of higher-dimensional noise naturally exhibits characteristics similar to stochastic differential equations (SDEs). Finally, we demonstrate strong performances on a set of representative pretrained diffusion models, including EDM, EDM2, and Stable-Diffusion 3, which cover models in both pixel and latent spaces, as well as class and text conditional settings. The code is available at https://github.com/apple/ml-tada.