Learning to Discretize Denoising Diffusion ODEs

📅 2024-05-24

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address the high computational cost of diffusion model sampling—specifically the large number of neural function evaluations (NFEs) required—this paper proposes LD3, a lightweight, plug-and-play framework. LD3 is the first method to formulate timestep selection as a learnable, parameterized process without modifying or retraining the backbone diffusion model, and it is compatible with mainstream solvers such as DDIM and Heun. Leveraging gradient-driven timestep optimization and adaptive ODE discretization, LD3 achieves significantly improved sampling efficiency while maintaining theoretical convergence guarantees. Trained in only 5–10 minutes on CIFAR-10 and AFHQv2, LD3 attains FID scores of 2.38 and 2.27 using merely 10 NFEs—substantially reducing computational cost and improving generation quality over baselines. Its core innovation lies in decoupling the learning of time discretization from model weight updates, enabling efficient, robust, and architecture-agnostic acceleration.

Technology Category

Application Category

📝 Abstract

Diffusion Probabilistic Models (DPMs) are generative models showing competitive performance in various domains, including image synthesis and 3D point cloud generation. Sampling from pre-trained DPMs involves multiple neural function evaluations (NFEs) to transform Gaussian noise samples into images, resulting in higher computational costs compared to single-step generative models such as GANs or VAEs. Therefore, reducing the number of NFEs while preserving generation quality is crucial. To address this, we propose LD3, a lightweight framework designed to learn the optimal time discretization for sampling. LD3 can be combined with various samplers and consistently improves generation quality without having to retrain resource-intensive neural networks. We demonstrate analytically and empirically that LD3 improves sampling efficiency with much less computational overhead. We evaluate our method with extensive experiments on 7 pre-trained models, covering unconditional and conditional sampling in both pixel-space and latent-space DPMs. We achieve FIDs of 2.38 (10 NFE), and 2.27 (10 NFE) on unconditional CIFAR10 and AFHQv2 in 5-10 minutes of training. LD3 offers an efficient approach to sampling from pre-trained diffusion models. Code is available at https://github.com/vinhsuhi/LD3.

Problem

Research questions and friction points this paper is trying to address.

Reduces neural function evaluations

Improves diffusion model sampling efficiency

Preserves generation quality with less computation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns optimal time discretization

Improves sampling efficiency

Reduces computational overhead

🔎 Similar Papers

Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems