Enhancing Diffusion Models Efficiency by Disentangling Total-Variance and Signal-to-Noise Ratio

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models face a quality degradation bottleneck in few-step sampling due to the entanglement of total variation (TV) and signal-to-noise ratio (SNR). This work proposes the first TV/SNR decoupled noise scheduling framework, enabling independent control of these two factors. Theoretical analysis shows that fixing TV while retaining standard SNR schedules significantly improves generation quality. We further extend this framework to optimal transport flow matching, unifying molecular conformation generation and image synthesis under a single modeling paradigm. Experiments demonstrate: (i) in 3D molecular generation, chemically valid structures are stably produced within only 4–8 steps; (ii) in image generation, uniform time stepping achieves performance on par with the EDM-customized sampler—matching FID and Inception Score (IS) while accelerating sampling by 2–5×. This work establishes a new paradigm for efficient and controllable diffusion sampling.

Technology Category

Application Category

📝 Abstract
The long sampling time of diffusion models remains a significant bottleneck, which can be mitigated by reducing the number of diffusion time steps. However, the quality of samples with fewer steps is highly dependent on the noise schedule, i.e., the specific manner in which noise is introduced and the signal is reduced at each step. Although prior work has improved upon the original variance-preserving and variance-exploding schedules, these approaches $ extit{passively}$ adjust the total variance, without direct control over it. In this work, we propose a novel total-variance/signal-to-noise-ratio disentangled (TV/SNR) framework, where TV and SNR can be controlled independently. Our approach reveals that different existing schedules, where the TV explodes exponentially, can be $ extit{improved}$ by setting a constant TV schedule while preserving the same SNR schedule. Furthermore, generalizing the SNR schedule of the optimal transport flow matching significantly improves the performance in molecular structure generation, achieving few step generation of stable molecules. A similar tendency is observed in image generation, where our approach with a uniform diffusion time grid performs comparably to the highly tailored EDM sampler.
Problem

Research questions and friction points this paper is trying to address.

Reducing diffusion model sampling time
Controlling variance and SNR independently
Improving molecular and image generation efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles TV and SNR control
Improves schedules with constant TV
Enhances molecular and image generation
🔎 Similar Papers
No similar papers found.