Elucidating the Preconditioning in Consistency Distillation

πŸ“… 2025-02-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the suboptimal performance in consistency distillation caused by hand-crafted preconditioning. We establish, for the first time, a theoretical analysis framework that reveals the intrinsic relationship between preconditioning and teacher ODE trajectories. To this end, we propose Analytic-Precondβ€”a method that optimizes preconditioning adaptively, interpretably, and data-independently via analytical gradients of the consistency error. Unlike conventional approaches, Analytic-Precond requires no manual hyperparameter tuning. It significantly improves alignment fidelity between student and teacher ODE trajectories, achieving 2–3Γ— training speedup in multi-step generative tasks. Extensive experiments across multiple benchmark datasets validate its effectiveness and generalizability. Our work introduces a novel paradigm for knowledge distillation in diffusion models, grounded in rigorous theoretical analysis and practical efficacy.

Technology Category

Application Category

πŸ“ Abstract
Consistency distillation is a prevalent way for accelerating diffusion models adopted in consistency (trajectory) models, in which a student model is trained to traverse backward on the probability flow (PF) ordinary differential equation (ODE) trajectory determined by the teacher model. Preconditioning is a vital technique for stabilizing consistency distillation, by linear combining the input data and the network output with pre-defined coefficients as the consistency function. It imposes the boundary condition of consistency functions without restricting the form and expressiveness of the neural network. However, previous preconditionings are hand-crafted and may be suboptimal choices. In this work, we offer the first theoretical insights into the preconditioning in consistency distillation, by elucidating its design criteria and the connection to the teacher ODE trajectory. Based on these analyses, we further propose a principled way dubbed extit{Analytic-Precond} to analytically optimize the preconditioning according to the consistency gap (defined as the gap between the teacher denoiser and the optimal student denoiser) on a generalized teacher ODE. We demonstrate that Analytic-Precond can facilitate the learning of trajectory jumpers, enhance the alignment of the student trajectory with the teacher's, and achieve $2 imes$ to $3 imes$ training acceleration of consistency trajectory models in multi-step generation across various datasets.
Problem

Research questions and friction points this paper is trying to address.

Optimizing preconditioning in consistency distillation
Enhancing student-teacher trajectory alignment
Accelerating training of consistency trajectory models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analytic-Precond optimizes preconditioning analytically
Enhances alignment of student and teacher trajectories
Achieves 2x to 3x training acceleration
πŸ”Ž Similar Papers
No similar papers found.
K
Kaiwen Zheng
Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua University, Beijing, China
Guande He
Guande He
Ph.D. Student, University of Texas at Austin
Machine LearningFoundation ModelDeep Generative Models
Jianfei Chen
Jianfei Chen
Associate Professor, Tsinghua University
Machine Learning
Fan Bao
Fan Bao
ShengShu
machine learning
J
Jun Zhu
Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua University, Beijing, China; Pazhou Lab (Huangpu), Guangzhou, China