FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Federated learning (FL) requires clients to train full models, making it infeasible for large-scale models; split learning (SL) alleviates client memory constraints but suffers from high communication latency due to sequential training. Existing parallelization approaches are accuracy-limited by the absence of server-side gradient feedback. This paper proposes a novel federated split learning framework. It introduces the first gradient estimation mechanism based on smashed activations, employing a lightweight auxiliary model to dynamically approximate server gradient behavior—enabling efficient local updates without explicit gradient backpropagation. The framework integrates local-loss-driven parallel training, knowledge distillation, and adaptive model synchronization. We theoretically establish an $O(1/sqrt{T})$ convergence rate, matching that of FedAvg. Experiments demonstrate substantial reductions in both communication overhead and client memory usage, achieving optimal trade-offs between accuracy and efficiency.

Technology Category

Application Category

📝 Abstract
Collaborative training methods like Federated Learning (FL) and Split Learning (SL) enable distributed machine learning without sharing raw data. However, FL assumes clients can train entire models, which is infeasible for large-scale models. In contrast, while SL alleviates the client memory constraint in FL by offloading most training to the server, it increases network latency due to its sequential nature. Other methods address the conundrum by using local loss functions for parallel client-side training to improve efficiency, but they lack server feedback and potentially suffer poor accuracy. We propose FSL-SAGE (Federated Split Learning via Smashed Activation Gradient Estimation), a new federated split learning algorithm that estimates server-side gradient feedback via auxiliary models. These auxiliary models periodically adapt to emulate server behavior on local datasets. We show that FSL-SAGE achieves a convergence rate of $mathcal{O}(1/sqrt{T})$, where $T$ is the number of communication rounds. This result matches FedAvg, while significantly reducing communication costs and client memory requirements. Our empirical results also verify that it outperforms existing state-of-the-art FSL methods, offering both communication efficiency and accuracy.
Problem

Research questions and friction points this paper is trying to address.

Reducing network latency in split learning
Improving accuracy with server feedback
Lowering client memory requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates server gradients via auxiliary models
Reduces communication costs significantly
Maintains accuracy while lowering memory needs
🔎 Similar Papers
2024-01-22AAAI Conference on Artificial IntelligenceCitations: 1
S
Srijith Nair
Department of Electrical and Computer Engineering, The Ohio State University, Columbus, Ohio, USA
Michael Lin
Michael Lin
Glaucoma Specialist, Massachusetts Eye and Ear
Glaucoma
A
Amirreza Talebi
Department of Industrial Engineering, The Ohio State University, Columbus, Ohio, USA
Peizhong Ju
Peizhong Ju
Assistant Professor of Computer Science, University of Kentucky
Machine LearningSmart GridOptimizationWireless Communication
E
Elizabeth Bentley
Air Force Research Laboratory, Rome, New York, USA
J
Jia Liu
Department of Electrical and Computer Engineering, The Ohio State University, Columbus, Ohio, USA