Ampere: Communication-Efficient and High-Accuracy Split Federated Learning

πŸ“… 2025-07-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the three key challenges in Split Federated Learning (SFL)β€”high communication overhead, heavy on-device computation, and degraded model accuracy under non-IID dataβ€”this paper proposes a decoupled, efficient training framework. Our method introduces: (1) a unidirectional block-wise training mechanism with a locally defined loss function, eliminating gradient uploads entirely; (2) a lightweight auxiliary network generation technique that compresses frequent intermediate activation exchanges into a single transmission; and (3) a cross-device activation aggregation strategy to mitigate the impact of data heterogeneity. Extensive experiments demonstrate that, compared to state-of-the-art SFL approaches, our framework achieves up to 13.26% higher accuracy (with a 53.39% reduction in standard deviation), reduces training time by 94.6%, cuts communication overhead by 99.1%, and decreases on-device computational load by 93.13%.

Technology Category

Application Category

πŸ“ Abstract
A Federated Learning (FL) system collaboratively trains neural networks across devices and a server but is limited by significant on-device computation costs. Split Federated Learning (SFL) systems mitigate this by offloading a block of layers of the network from the device to a server. However, in doing so, it introduces large communication overheads due to frequent exchanges of intermediate activations and gradients between devices and the server and reduces model accuracy for non-IID data. We propose Ampere, a novel collaborative training system that simultaneously minimizes on-device computation and device-server communication while improving model accuracy. Unlike SFL, which uses a global loss by iterative end-to-end training, Ampere develops unidirectional inter-block training to sequentially train the device and server block with a local loss, eliminating the transfer of gradients. A lightweight auxiliary network generation method decouples training between the device and server, reducing frequent intermediate exchanges to a single transfer, which significantly reduces the communication overhead. Ampere mitigates the impact of data heterogeneity by consolidating activations generated by the trained device block to train the server block, in contrast to SFL, which trains on device-specific, non-IID activations. Extensive experiments on multiple CNNs and transformers show that, compared to state-of-the-art SFL baseline systems, Ampere (i) improves model accuracy by up to 13.26% while reducing training time by up to 94.6%, (ii) reduces device-server communication overhead by up to 99.1% and on-device computation by up to 93.13%, and (iii) reduces standard deviation of accuracy by 53.39% for various non-IID degrees highlighting superior performance when faced with heterogeneous data.
Problem

Research questions and friction points this paper is trying to address.

Reduces communication overhead in Split Federated Learning
Improves model accuracy for non-IID data
Minimizes on-device computation and device-server communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unidirectional inter-block training with local loss
Lightweight auxiliary network generation method
Consolidates activations to mitigate data heterogeneity
πŸ”Ž Similar Papers
No similar papers found.
Z
Zihan Zhang
School of Computer Science, University of St Andrews, UK
L
Leon Wong
Rakuten Mobile, Inc., Japan
Blesson Varghese
Blesson Varghese
Reader in Computer Science, University of St Andrews, UK
Distributed systemsCloud/Edge computingEdge intelligenceDistributed machine learning