Ampere: Communication-Efficient and High-Accuracy Split Federated Learning

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the three key challenges in Split Federated Learning (SFL)—high communication overhead, heavy on-device computation, and degraded model accuracy under non-IID data—this paper proposes a decoupled, efficient training framework. Our method introduces: (1) a unidirectional block-wise training mechanism with a locally defined loss function, eliminating gradient uploads entirely; (2) a lightweight auxiliary network generation technique that compresses frequent intermediate activation exchanges into a single transmission; and (3) a cross-device activation aggregation strategy to mitigate the impact of data heterogeneity. Extensive experiments demonstrate that, compared to state-of-the-art SFL approaches, our framework achieves up to 13.26% higher accuracy (with a 53.39% reduction in standard deviation), reduces training time by 94.6%, cuts communication overhead by 99.1%, and decreases on-device computational load by 93.13%.

Technology Category

Application Category

📝 Abstract

A Federated Learning (FL) system collaboratively trains neural networks across devices and a server but is limited by significant on-device computation costs. Split Federated Learning (SFL) systems mitigate this by offloading a block of layers of the network from the device to a server. However, in doing so, it introduces large communication overheads due to frequent exchanges of intermediate activations and gradients between devices and the server and reduces model accuracy for non-IID data. We propose Ampere, a novel collaborative training system that simultaneously minimizes on-device computation and device-server communication while improving model accuracy. Unlike SFL, which uses a global loss by iterative end-to-end training, Ampere develops unidirectional inter-block training to sequentially train the device and server block with a local loss, eliminating the transfer of gradients. A lightweight auxiliary network generation method decouples training between the device and server, reducing frequent intermediate exchanges to a single transfer, which significantly reduces the communication overhead. Ampere mitigates the impact of data heterogeneity by consolidating activations generated by the trained device block to train the server block, in contrast to SFL, which trains on device-specific, non-IID activations. Extensive experiments on multiple CNNs and transformers show that, compared to state-of-the-art SFL baseline systems, Ampere (i) improves model accuracy by up to 13.26% while reducing training time by up to 94.6%, (ii) reduces device-server communication overhead by up to 99.1% and on-device computation by up to 93.13%, and (iii) reduces standard deviation of accuracy by 53.39% for various non-IID degrees highlighting superior performance when faced with heterogeneous data.

Problem

Research questions and friction points this paper is trying to address.

Reduces communication overhead in Split Federated Learning

Improves model accuracy for non-IID data

Minimizes on-device computation and device-server communication

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unidirectional inter-block training with local loss

Lightweight auxiliary network generation method

Consolidates activations to mitigate data heterogeneity

🔎 Similar Papers

No similar papers found.