Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Traditional diffusion models rely on a large number of time steps to ensure asymptotic Gaussianity in the reverse process, resulting in inefficient training. This work challenges that assumption and proposes a T-space decoupled training paradigm: by optimizing the noise schedule, the diffusion process is fully decoupled into a single latent state, enabling independent and parallel training across all time steps. This eliminates sequential dependencies, facilitating distributed training and ensemble sampling. Experiments on ImageNet and CelebA demonstrate 4–6× faster convergence, with FID and LPIPS scores matching or surpassing those of baseline models, while significantly reducing computational overhead. The core contribution is the first realization of complete T-space decoupling into a single state—establishing a novel, highly efficient training paradigm for diffusion models.

Technology Category

Application Category

📝 Abstract

We challenge a fundamental assumption of diffusion models, namely, that a large number of latent-states or time-steps is required for training so that the reverse generative process is close to a Gaussian. We first show that with careful selection of a noise schedule, diffusion models trained over a small number of latent states (i.e. $T sim 32$) match the performance of models trained over a much large number of latent states ($T sim 1,000$). Second, we push this limit (on the minimum number of latent states required) to a single latent-state, which we refer to as complete disentanglement in T-space. We show that high quality samples can be easily generated by the disentangled model obtained by combining several independently trained single latent-state models. We provide extensive experiments to show that the proposed disentangled model provides 4-6$ imes$ faster convergence measured across a variety of metrics on two different datasets.

Problem

Research questions and friction points this paper is trying to address.

Reducing diffusion model training latent-states

Achieving disentanglement with minimal time-steps

Enabling faster distributed training convergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fewer latent-states diffusion training

Careful noise schedule selection

Combining single latent-state models

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training