Heterogeneous Decentralized Diffusion Models

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This work addresses the limitations of existing decentralized diffusion model training, which demands substantial computational resources and enforces homogeneous training objectives across all expert models, thereby restricting participation flexibility. We propose the first decentralized framework that supports heterogeneous training objectives, enabling multiple experts to train independently—each using distinct objectives such as DDPM or Flow Matching—while remaining isolated during training. Through schedule-aware velocity space alignment, these heterogeneous experts can be seamlessly fused at inference time without requiring synchronization or retraining. By integrating pre-trained checkpoint conversion with a lightweight AdaLN-Single architecture, our approach reduces computational requirements on LAION-Aesthetics from 1,176 to 72 GPU-days (a 16× reduction) and decreases data usage by 14×. The resulting heterogeneous configuration achieves a FID of 11.88, outperforming its homogeneous counterpart (12.45) and enhancing prompt-conditioned diversity.

Technology Category

Application Category

📝 Abstract

Training frontier-scale diffusion models often requires substantial computational resources concentrated in tightly coupled clusters, limiting participation to well-resourced institutions. While Decentralized Diffusion Models (DDM) enable training multiple experts in isolation, existing approaches require 1176 GPU-days and homogeneous training objectives across all experts. We present an efficient framework that reduces resource requirements while supporting heterogeneous training objectives. Our approach combines three contributions: (1) a heterogeneous decentralized training paradigm that allows experts to use different objectives (DDPM and Flow Matching), unified at inference time via a deterministic schedule-aware conversion into a common velocity space without retraining; (2) pretrained checkpoint conversion from ImageNet-DDPM to Flow Matching objectives, accelerating convergence and enabling initialization without objective-specific pretraining; and (3) PixArt-alpha's efficient AdaLN-Single architecture, reducing parameters while maintaining quality. Experiments on LAION-Aesthetics show that, relative to the training scale reported for prior DDM work, our approach reduces compute from 1176 to 72 GPU-days (16x) and data from 158M to 11M (14x). Under aligned inference settings, our heterogeneous 2DDPM:6FM configuration achieves better FID (11.88 vs. 12.45) and higher intra-prompt diversity (LPIPS 0.631 vs. 0.617) than the homogeneous 8FM baseline. By eliminating synchronization requirements and enabling mixed DDPM/FM objectives, our framework lowers infrastructure requirements for decentralized generative model training.

Problem

Research questions and friction points this paper is trying to address.

Decentralized Diffusion Models

Heterogeneous Training Objectives

Computational Resource Efficiency

Diffusion Model Training

Generative Model Decentralization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous Decentralized Training

Diffusion Models

Flow Matching