Paris: A Decentralized Trained Open-Weight Diffusion Model

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently training high-quality text-to-image diffusion models without centralized coordination infrastructure. We propose the first fully decentralized diffusion model pretraining framework: the model is partitioned into eight independent expert subnetworks, each trained autonomously—without parameter or gradient synchronization—and a lightweight semantic-aware Transformer router dynamically assigns data to experts via clustering, enabling distributed training across heterogeneous hardware. Our method achieves comparable generation quality using only 1/14 the data volume and 1/16 the computational resources required by centralized baselines. To our knowledge, this is the first open-source, commercially viable, fully decentralized pretraining framework for text-to-image diffusion models. It empirically validates the feasibility and efficiency of decentralized training in generative AI, establishing a new paradigm for privacy-preserving learning, computational democratization, and federated AI research.

Technology Category

Application Category

📝 Abstract
We present Paris, the first publicly released diffusion model pre-trained entirely through decentralized computation. Paris demonstrates that high-quality text-to-image generation can be achieved without centrally coordinated infrastructure. Paris is open for research and commercial use. Paris required implementing our Distributed Diffusion Training framework from scratch. The model consists of 8 expert diffusion models (129M-605M parameters each) trained in complete isolation with no gradient, parameter, or intermediate activation synchronization. Rather than requiring synchronized gradient updates across thousands of GPUs, we partition data into semantically coherent clusters where each expert independently optimizes its subset while collectively approximating the full distribution. A lightweight transformer router dynamically selects appropriate experts at inference, achieving generation quality comparable to centrally coordinated baselines. Eliminating synchronization enables training on heterogeneous hardware without specialized interconnects. Empirical validation confirms that Paris's decentralized training maintains generation quality while removing the dedicated GPU cluster requirement for large-scale diffusion models. Paris achieves this using 14$ imes$ less training data and 16$ imes$ less compute than the prior decentralized baseline.
Problem

Research questions and friction points this paper is trying to address.

Decentralized training of diffusion models without centralized infrastructure
Eliminating synchronization requirements for heterogeneous hardware training
Achieving comparable generation quality with reduced data and computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized training without central coordination
Independent experts trained on data clusters
Lightweight router dynamically selects experts
🔎 Similar Papers
No similar papers found.