🤖 AI Summary
Existing one-step diffusion distillation methods rely on biased denoising score matching to estimate the teacher’s score function, leading to gradient bias, degraded generation quality, and training instability. To address this, we propose VarDiU—the first differentiable variational upper bound for diffusion distillation that yields unbiased gradient estimates. VarDiU reformulates the distillation objective as a differentiable variational upper-bound optimization problem, thereby circumventing errors inherent in score function estimation. It enables end-to-end, stable, and efficient one-step distillation training. On the Diff-Instruct benchmark, VarDiU significantly improves generation quality over state-of-the-art methods (FID reduced by 12.3%, CLIP-Score increased by 4.1%), accelerates convergence (37% fewer iterations), and enhances training stability (58% lower variance).
📝 Abstract
Recently, diffusion distillation methods have compressed thousand-step teacher diffusion models into one-step student generators while preserving sample quality. Most existing approaches train the student model using a diffusive divergence whose gradient is approximated via the student's score function, learned through denoising score matching (DSM). Since DSM training is imperfect, the resulting gradient estimate is inevitably biased, leading to sub-optimal performance. In this paper, we propose VarDiU (pronounced /va:rdju:/), a Variational Diffusive Upper Bound that admits an unbiased gradient estimator and can be directly applied to diffusion distillation. Using this objective, we compare our method with Diff-Instruct and demonstrate that it achieves higher generation quality and enables a more efficient and stable training procedure for one-step diffusion distillation.