Improved Training Technique for Latent Consistency Models

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address training instability and degraded generation quality in latent consistency models (LCMs) caused by outlier interference, this paper proposes a robust training framework enabling high-fidelity single-step and two-step text-to-image and text-to-video generation. Our key contributions are: (1) the first adoption of Cauchy loss—replacing pseudo-Huber loss—to enhance robustness against outliers in latent-space optimization; (2) coupling early-diffusion loss with an optimal transport (OT) objective to improve alignment between predicted and target latent distributions; and (3) introducing an adaptive scaling-c scheduler and non-scaled LayerNorm to stabilize training dynamics. Experiments demonstrate that our method significantly narrows the performance gap between LCMs and diffusion models, achieving state-of-the-art fidelity and sampling efficiency across multi-scale text-conditioned generation tasks.

Technology Category

Application Category

📝 Abstract

Consistency models are a new family of generative models capable of producing high-quality samples in either a single step or multiple steps. Recently, consistency models have demonstrated impressive performance, achieving results on par with diffusion models in the pixel space. However, the success of scaling consistency training to large-scale datasets, particularly for text-to-image and video generation tasks, is determined by performance in the latent space. In this work, we analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers, which significantly degrade the performance of iCT in the latent space. To address this, we replace Pseudo-Huber losses with Cauchy losses, effectively mitigating the impact of outliers. Additionally, we introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance. Lastly, we introduce the adaptive scaling-$c$ scheduler to manage the robust training process and adopt Non-scaling LayerNorm in the architecture to better capture the statistics of the features and reduce outlier impact. With these strategies, we successfully train latent consistency models capable of high-quality sampling with one or two steps, significantly narrowing the performance gap between latent consistency and diffusion models. The implementation is released here: https://github.com/quandao10/sLCT/

Problem

Research questions and friction points this paper is trying to address.

Consistency Model Improvement

Robustness to Anomalies

Diffusion Model Performance Gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized Computational Flow

Diffusion Computation

Smart Adjustment Strategy

🔎 Similar Papers

Consistent estimation of generative model representations in the data kernel perspective space