🤖 AI Summary
This work addresses the challenges of computational load imbalance and excessive communication overhead in fine-tuning large language models via federated split learning, which arise from device and data heterogeneity. To mitigate these issues, the authors propose an adaptive federated split learning system that introduces, for the first time, a dynamically adjustable split-layer mechanism, enabling each client to autonomously select its optimal splitting point based on local computational capacity and model performance. The system further integrates Low-Rank Adaptation (LoRA) to compress transmitted model updates. To better emulate realistic heterogeneous scenarios, a sequence-length-aware Dirichlet-based data partitioning strategy is devised. Experimental results across multiple mainstream benchmarks demonstrate that the proposed approach significantly reduces both communication costs and training time while simultaneously improving model accuracy.
📝 Abstract
Federated Split Learning has been identified as an efficient approach to address the computational resource constraints of clients in classical federated learning, while guaranteeing data privacy for distributed model training across data owners. However, it faces some critical challenges when such a training strategy meets large language models (LLMs) for fine-tuning. Such challenges include setting the cutlayer adaptively across different clients to address the data and device heterogeneity issues, which affect the system performance significantly. In addition, efficiently reducing the communication overhead during the fine-tuning procedure is also another challenge. No work tries to address these challenges.
To bridge this gap, we propose SplitTF, an adaptive federated split learning system for LLMs fine-tuning. SplitFT enables different clients to set different cut layers according to their computation resources and trained model performance. SplitFT also proposes to reduce the LoRA rank in cutlayer to reduce the communication overhead. In addition to simulating the heterogeneous data in real-world applications for our proposed split federated learning system, we propose a length-based Dirichlet approach to divide the training data into different clients. Extensive experimental results show that our proposed approach outperforms the state-of-the-art approach for fine-tuning time efficiency and model performance based on various popular benchmarks.