🤖 AI Summary
Diffusion Denoising Bridge Models (DDBMs) suffer from prohibitively high computational cost, requiring hundreds of network evaluations per sample. To address this, we propose a training-free fast sampling framework. Our approach (1) generalizes DDBMs to non-Markovian discrete-time diffusion bridges, enabling flexible trajectory modeling; (2) derives a novel implicit ordinary differential equation (ODE) formulation, compatible with high-order numerical integrators for accelerated inference; and (3) introduces a bootstrapping noise mechanism that preserves generation diversity without any retraining. Experiments on image translation demonstrate a 25× speedup over standard DDBM sampling, while maintaining high-fidelity reconstruction and semantically coherent interpolation—achieving, for the first time, efficient, zero-training sampling for DDBMs.
📝 Abstract
Denoising diffusion bridge models (DDBMs) are a powerful variant of diffusion models for interpolating between two arbitrary paired distributions given as endpoints. Despite their promising performance in tasks like image translation, DDBMs require a computationally intensive sampling process that involves the simulation of a (stochastic) differential equation through hundreds of network evaluations. In this work, we take the first step in fast sampling of DDBMs without extra training, motivated by the well-established recipes in diffusion models. We generalize DDBMs via a class of non-Markovian diffusion bridges defined on the discretized timesteps concerning sampling, which share the same marginal distributions and training objectives, give rise to generative processes ranging from stochastic to deterministic, and result in diffusion bridge implicit models (DBIMs). DBIMs are not only up to 25$ imes$ faster than the vanilla sampler of DDBMs but also induce a novel, simple, and insightful form of ordinary differential equation (ODE) which inspires high-order numerical solvers. Moreover, DBIMs maintain the generation diversity in a distinguished way, by using a booting noise in the initial sampling step, which enables faithful encoding, reconstruction, and semantic interpolation in image translation tasks. Code is available at https://github.com/thu-ml/DiffusionBridge.