Practical Federated Learning without a Server

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the deployment limitations of federated learning (FL) imposed by reliance on a centralized parameter server, this paper introduces Plexus—the first serverless, large-scale decentralized FL system. Plexus eliminates the need for any central coordinator through four key innovations: (1) a distributed-consensus-based client sampling mechanism; (2) a local model aggregation strategy; (3) latency-aware dynamic topology construction; and (4) a lightweight communication protocol. Extensive experiments at scale—up to 1,000 participating nodes—demonstrate that Plexus achieves 1.4–1.6× training speedup over prior decentralized FL approaches, reduces communication overhead by 15.8–292×, and cuts computational resource consumption by 30.5–77.9×. These gains significantly enhance system scalability while strengthening privacy preservation by removing centralized trust assumptions and minimizing cross-node data exposure.

Technology Category

Application Category

📝 Abstract
Federated Learning (FL) enables end-user devices to collaboratively train ML models without sharing raw data, thereby preserving data privacy. In FL, a central parameter server coordinates the learning process by iteratively aggregating the trained models received from clients. Yet, deploying a central server is not always feasible due to hardware unavailability, infrastructure constraints, or operational costs. We present Plexus, a fully decentralized FL system for large networks that operates without the drawbacks originating from having a central server. Plexus distributes the responsibilities of model aggregation and sampling among participating nodes while avoiding network-wide coordination. We evaluate Plexus using realistic traces for compute speed, pairwise latency and network capacity. Our experiments on three common learning tasks and with up to 1000 nodes empirically show that Plexus reduces time-to-accuracy by 1.4-1.6x, communication volume by 15.8-292x and training resources needed for convergence by 30.5-77.9x compared to conventional decentralized learning algorithms.
Problem

Research questions and friction points this paper is trying to address.

Decentralized Federated Learning without a central server
Reduces communication volume and training resources
Improves time-to-accuracy in large network environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized Federated Learning without central server
Distributed model aggregation and sampling among nodes
Reduced time-to-accuracy and communication volume significantly
🔎 Similar Papers
No similar papers found.