WWW.Serve: Interconnecting Global LLM Services through Decentralization

📅 2026-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scalability limitations of centralized large language model (LLM) services and the underutilization of globally distributed GPU resources. Existing decentralized approaches often overlook competitive dynamics among participants and rely on unrealistic assumptions. To overcome these issues, we propose a novel decentralized framework that, for the first time, explicitly accounts for participant autonomy and competition in LLM serving. Our approach eliminates reliance on fixed software-hardware stacks and strong central coordination, instead leveraging a decentralized network architecture, a self-organizing request scheduling algorithm, and a flexible resource commitment mechanism to enable autonomous collaboration in heterogeneous environments. Experimental results demonstrate that our method improves global service-level objective attainment by up to 1.5×, reduces latency by 27.6%, and matches or surpasses the performance of centralized schedulers while preserving the inherent benefits of decentralization.

Technology Category

Application Category

📝 Abstract
Large language model (LLM) services are mostly centralized, leading to scalability bottlenecks and underutilization of substantial scattered GPU resources. While decentralization offers a promising alternative, existing frameworks primarily focus on cooperation among GPU providers while overlooking their inherent competitive dynamics, imposing substantial constraints such as excessive platform-level oversight or rigid requirements to execute all assigned requests using fixed software stacks on fixed hardware configurations. We argue that such assumptions are unrealistic in real-world decentralized environments. To this end, we propose WWW.Serve, a decentralized framework for interconnecting LLM services worldwide. It allows participants to flexibly determine their participation policies and resource commitments, and supports self-organizing request dispatch, enabling the network to autonomously allocate requests without centralized coordination. Empirically, we show that WWW.Serve improves global SLO (service-level-objective) attainment by up to 1.5x and lowers latency by 27.6%. Its performance approaches, and in some cases surpasses, centralized scheduling, while fully preserving the benefits of decentralization. These results highlight WWW.Serve as a promising foundation for real-world, decentralized LLM serving.
Problem

Research questions and friction points this paper is trying to address.

LLM services
decentralization
GPU resource utilization
scalability bottleneck
competitive dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

decentralized LLM serving
self-organizing request dispatch
flexible participation policy
global GPU resource utilization
SLO-aware scheduling
🔎 Similar Papers
No similar papers found.