Noise Conditional Variational Score Distillation

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low sampling efficiency of diffusion models by proposing a variational score distillation framework that distills a pre-trained diffusion model into a generative denoiser generalizable across the full noise schedule. Theoretically, it establishes for the first time the equivalence between the unconditional score function and the score of the denoised posterior distribution. Methodologically, it introduces noise-conditioned modeling and a generative denoiser architecture, enabling zero-shot probabilistic inference, single-step high-speed sampling, and multi-step quality-controllable generation. Experiments demonstrate that the method achieves state-of-the-art LPIPS scores—surpassing those of the teacher diffusion model—in both class-conditional image generation and inverse problems. It significantly reduces the number of function evaluations (NFE) while matching the performance of substantially larger consistency models.

Technology Category

Application Category

📝 Abstract
We propose Noise Conditional Variational Score Distillation (NCVSD), a novel method for distilling pretrained diffusion models into generative denoisers. We achieve this by revealing that the unconditional score function implicitly characterizes the score function of denoising posterior distributions. By integrating this insight into the Variational Score Distillation (VSD) framework, we enable scalable learning of generative denoisers capable of approximating samples from the denoising posterior distribution across a wide range of noise levels. The proposed generative denoisers exhibit desirable properties that allow fast generation while preserve the benefit of iterative refinement: (1) fast one-step generation through sampling from pure Gaussian noise at high noise levels; (2) improved sample quality by scaling the test-time compute with multi-step sampling; and (3) zero-shot probabilistic inference for flexible and controllable sampling. We evaluate NCVSD through extensive experiments, including class-conditional image generation and inverse problem solving. By scaling the test-time compute, our method outperforms teacher diffusion models and is on par with consistency models of larger sizes. Additionally, with significantly fewer NFEs than diffusion-based methods, we achieve record-breaking LPIPS on inverse problems.
Problem

Research questions and friction points this paper is trying to address.

Distilling diffusion models into generative denoisers
Learning scalable denoisers for various noise levels
Enabling fast generation with iterative refinement benefits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise Conditional Variational Score Distillation
Scalable learning of generative denoisers
Fast one-step and multi-step sampling
🔎 Similar Papers
No similar papers found.
X
Xinyu Peng
Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Ziyang Zheng
Ziyang Zheng
Shanghai Jiao Tong University
Signal ProcessingInverse ProblemPhotonic Computing
Y
Yaoming Wang
Meituan Inc, China
H
Han Li
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
N
Nuowen Kan
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
Wenrui Dai
Wenrui Dai
Shanghai Jiao Tong University
Predictive ModelingImage/Video CodingSignal Processing
C
Chenglin Li
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
Junni Zou
Junni Zou
Professor, Shanghai Jiao Tong University
Multimedia communications - network resource optimization
Hongkai Xiong
Hongkai Xiong
Distinguished Professor, Shanghai Jiao Tong University
Image and video codingsignal processingmultimedia communicationvision and learning