Truncated Consistency Models

📅 2024-10-18
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Consistency models suffer from limited one-step generation performance, high training complexity, and poor interpretability under full-time PF-ODE training. To address these issues, this paper proposes a truncated-time consistency modeling framework that restricts consistency learning exclusively to the critical noise-to-data transition regime—thereby significantly improving sampling efficiency and generation quality. We introduce a novel parameterization of the consistency function and a two-stage optimization strategy to effectively avoid degenerate solutions. Coupled with a lightweight network architecture, our method achieves state-of-the-art one-step and two-step FID scores on CIFAR-10 and ImageNet 64×64—outperforming iCT-deep and other SOTA approaches—while reducing model parameters by over 2×. The framework thus achieves an optimal trade-off among high fidelity, computational efficiency, and pedagogical interpretability.

Technology Category

Application Category

📝 Abstract
Consistency models have recently been introduced to accelerate sampling from diffusion models by directly predicting the solution (i.e., data) of the probability flow ODE (PF ODE) from initial noise. However, the training of consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints. This task is much more challenging than the ultimate objective of one-step generation, which only concerns the PF ODE's noise-to-data mapping. We empirically find that this training paradigm limits the one-step generation performance of consistency models. To address this issue, we generalize consistency training to the truncated time range, which allows the model to ignore denoising tasks at earlier time steps and focus its capacity on generation. We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution. Experiments on CIFAR-10 and ImageNet $64 imes64$ datasets show that our method achieves better one-step and two-step FIDs than the state-of-the-art consistency models such as iCT-deep, using more than 2$ imes$ smaller networks. Project page: https://truncated-cm.github.io/
Problem

Research questions and friction points this paper is trying to address.

Continuous Consistency Models
Image Generation
Simplification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simplified Continuous Consistency Model
Enhanced Training Methodology
Efficient Image Generation
🔎 Similar Papers
No similar papers found.