DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Increasing the number of latent channels in autoencoders improves reconstruction fidelity but severely impedes diffusion model convergence, limiting both generative performance and the practical deployment of highly compressive autoencoders. To address this, we propose a structured latent space design: the leading latent channels explicitly model global structure, while trailing channels preserve fine-grained local details. We further introduce a structural-channel-aware diffusion training objective that accelerates convergence by prioritizing structural consistency. Integrating this framework with a deep-compression autoencoder (DC-AE-1.5-f64c128) and a multi-objective optimization strategy, our method achieves superior generative quality on ImageNet 512×512 compared to DC-AE-f32c32, while reducing training time by 4×. This work effectively breaks the long-standing trade-off between quality and efficiency in latent diffusion models, enabling high-fidelity synthesis under extreme spatial compression.

Technology Category

Application Category

📝 Abstract

We present DC-AE 1.5, a new family of deep compression autoencoders for high-resolution diffusion models. Increasing the autoencoder's latent channel number is a highly effective approach for improving its reconstruction quality. However, it results in slow convergence for diffusion models, leading to poorer generation quality despite better reconstruction quality. This issue limits the quality upper bound of latent diffusion models and hinders the employment of autoencoders with higher spatial compression ratios. We introduce two key innovations to address this challenge: i) Structured Latent Space, a training-based approach to impose a desired channel-wise structure on the latent space with front latent channels capturing object structures and latter latent channels capturing image details; ii) Augmented Diffusion Training, an augmented diffusion training strategy with additional diffusion training objectives on object latent channels to accelerate convergence. With these techniques, DC-AE 1.5 delivers faster convergence and better diffusion scaling results than DC-AE. On ImageNet 512x512, DC-AE-1.5-f64c128 delivers better image generation quality than DC-AE-f32c32 while being 4x faster. Code: https://github.com/dc-ai-projects/DC-Gen.

Problem

Research questions and friction points this paper is trying to address.

Slow convergence in diffusion models with high-channel autoencoders

Trade-off between reconstruction quality and generation quality

Limitation in employing autoencoders with higher compression ratios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured Latent Space for channel-wise organization

Augmented Diffusion Training with extra objectives

Higher compression ratios with faster convergence

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training