Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

📅 2024-04-02
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Intermediate checkpoints in diffusion models (DMs) and consistency models (CMs) are often underutilized, despite evidence that optimal weights frequently reside in non-convex “basins” where SGD fails to converge. Method: We propose LCSC—a learning-based checkpoint selection and combination framework—that employs evolutionary search to automatically learn linear weighting coefficients over trajectory checkpoints, integrates multi-stage weights, and synergistically combines consistency distillation with diffusion sampling optimization. Contribution/Results: LCSC establishes a generalizable checkpoint-weighted averaging paradigm that improves both generation quality and inference efficiency without increasing computational cost at deployment. On CIFAR-10 and ImageNet-64, LCSC achieves up to 23× training speedup; reduces DM sampling NFE from 15 to 9; and enables CM single-step inference to outperform the two-step baseline—demonstrating for the first time that trajectory-weighted averaging can transcend SGD’s convergence limitations, thereby introducing a novel training paradigm for generative models.

Technology Category

Application Category

📝 Abstract
Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks. When training DM and CM, intermediate weight checkpoints are not fully utilized and only the last converged checkpoint is used. In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by proper checkpoint averaging. Based on these observations, we propose LCSC, a simple but effective and efficient method to enhance the performance of DM and CM, by combining checkpoints along the training trajectory with coefficients deduced from evolutionary search. We demonstrate the value of LCSC through two use cases: $ extbf{(a) Reducing training cost.}$ With LCSC, we only need to train DM/CM with fewer number of iterations and/or lower batch sizes to obtain comparable sample quality with the fully trained model. For example, LCSC achieves considerable training speedups for CM (23$ imes$ on CIFAR-10 and 15$ imes$ on ImageNet-64). $ extbf{(b) Enhancing pre-trained models.}$ Assuming full training is already done, LCSC can further improve the generation quality or speed of the final converged models. For example, LCSC achieves better performance using 1 number of function evaluation (NFE) than the base model with 2 NFE on consistency distillation, and decreases the NFE of DM from 15 to 9 while maintaining the generation quality on CIFAR-10. Our code is available at https://github.com/imagination-research/LCSC.
Problem

Research questions and friction points this paper is trying to address.

Enhance DM and CM performance
Reduce training cost significantly
Improve pre-trained models' quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Checkpoint averaging enhances model performance.
Evolutionary search optimizes checkpoint coefficients.
LCSC reduces training costs significantly.
🔎 Similar Papers
No similar papers found.