Beyond A Single AI Cluster: A Survey of Decentralized LLM Training

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In the context of globally fragmented computing resources, large language model (LLM) training faces prohibitively high costs and strong centralization barriers. Method: We propose the first resource-driven, decentralized LLM training taxonomy—distinguishing community-driven and organization-driven paradigms—and rigorously delineate their conceptual boundaries with federated learning and distributed training via a unified classification framework. Our analysis integrates systematic literature review, conceptual comparison, and in-depth case studies to establish a multidimensional evaluation framework. Contribution/Results: This work delivers the first comprehensive survey of decentralized LLM training, identifying core technical challenges—including communication overhead, incentive compatibility, and security alignment—while synthesizing representative implementation pathways and outlining concrete future research directions. It provides both theoretical foundations and practical guidance for lowering LLM training barriers and advancing AI democratization.

Technology Category

Application Category

📝 Abstract
The emergence of large language models (LLMs) has revolutionized AI development, yet their training demands computational resources beyond a single cluster or even datacenter, limiting accessibility to large organizations. Decentralized training has emerged as a promising paradigm to leverage dispersed resources across clusters, datacenters, and global regions, democratizing LLM development for broader communities. As the first comprehensive exploration of this emerging field, we present decentralized LLM training as a resource-driven paradigm and categorize it into community-driven and organizational approaches. Furthermore, our in-depth analysis clarifies decentralized LLM training, including: (1) position with related domain concepts comparison, (2) decentralized resource development trends, and (3) recent advances with discussion under a novel taxonomy. We also provide up-to-date case studies and explore future directions, contributing to the evolution of decentralized LLM training research.
Problem

Research questions and friction points this paper is trying to address.

Decentralized LLM training leverages dispersed resources globally.
Democratizes LLM development for broader communities beyond large organizations.
Comprehensive exploration of decentralized LLM training paradigms and trends.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized LLM training across clusters
Community-driven and organizational approaches
Novel taxonomy for decentralized resource trends
🔎 Similar Papers
No similar papers found.
H
Haotian Dong
Tsinghua University
Jingyan Jiang
Jingyan Jiang
Shen Zhen Technology University
Test-time adaptation, Embodied AI,Machine learning system
Rongwei Lu
Rongwei Lu
Tsinghua University
Distributed machine learninggradient compressionfederated learning
J
Jiajun Luo
Tsinghua University
Jiajun Song
Jiajun Song
Michigan technological University
Wave Energy Converter
B
Bowen Li
Beihang University
Y
Ying Shen
China Central Depository & Clearing Co., Ltd.
Z
Zhi Wang
Tsinghua University