🤖 AI Summary
Existing score distillation methods solely employ the pretrained diffusion model’s final output as the teacher, neglecting consistency between the teacher’s and student’s convergence trajectories—leading to early-stage score mismatch. This paper proposes Distribution-Backtracking Distillation (DisBack), the first framework to explicitly model and leverage the teacher’s full degradation path. By recording its stepwise degradation process and performing reverse backtracking over intermediate distributions, DisBack guides the student generator to align with the teacher’s entire convergence trajectory. Our method adopts a two-stage paradigm: degradation-path recording and inverse distribution backtracking—enabling trajectory-level, rather than merely endpoint, matching. On ImageNet 64×64, DisBack achieves an FID of 1.38, demonstrating faster convergence and superior performance. Moreover, it is lightweight, plug-and-play, and fully compatible with existing distillation pipelines.
📝 Abstract
Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into a student generator to achieve one-step generation, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the distillation process, because existing methods mainly focus on using the endpoint of pre-trained diffusion models as teacher models, overlooking the importance of the convergence trajectory between the student generator and the teacher model. To address this issue, we extend the score distillation process by introducing the entire convergence trajectory of teacher models and propose Distribution Backtracking Distillation (DisBack). DisBask is composed of two stages: Degradation Recording and Distribution Backtracking. Degradation Recording is designed to obtain the convergence trajectory of the teacher model, which records the degradation path from the trained teacher model to the untrained initial student generator. The degradation path implicitly represents the teacher model's intermediate distributions, and its reverse can be viewed as the convergence trajectory from the student generator to the teacher model. Then Distribution Backtracking trains a student generator to backtrack the intermediate distributions along the path to approximate the convergence trajectory of teacher models. Extensive experiments show that DisBack achieves faster and better convergence than the existing distillation method and accomplishes comparable generation performance, with FID score of 1.38 on ImageNet 64x64 dataset. Notably, DisBack is easy to implement and can be generalized to existing distillation methods to boost performance. Our code is publicly available on https://github.com/SYZhang0805/DisBack.