🤖 AI Summary
To address the challenges of fine-tuning large language models (LLMs) on resource-constrained, heterogeneous edge devices with non-IID data distributions, this paper proposes a hierarchical decoupled fine-tuning framework: the device-side freezes the model backbone and performs only forward inference, while the server-side applies low-rank adaptation (LoRA) for parameter-efficient fine-tuning. By integrating decomposition-based backpropagation with pipeline parallelism, our approach eliminates backward computation on edge devices—achieving zero local gradient computation for the first time. Combining Split Learning and LoRA, our method improves accuracy by 69.4% over FedLoRA and SplitLoRA on GPT-2 under extremely imbalanced data, reduces on-device computation by 86.8%, and cuts total training time by 50.2%. Furthermore, we validate its cross-model and cross-task scalability on Llama-3.2 for GSM8K generation tasks.
📝 Abstract
Fine-tuning large language models (LLMs) on private, on-device data can empower tailored personalized AI agents. However, fine-tuning LLMs on resource-constrained edge devices faces significant challenges, including excessive computation overhead, device heterogeneity, and data imbalance. This paper proposes SplitFrozen, a split learning framework that enables efficient LLM fine-tuning by strategically freezing device-side model layers while centralizing parameter-efficient fine-tuning on the server. Our framework partitions LLMs into device-side frozen layers and server-side fine-tuning layers, where heterogeneous resource-constrained devices execute only forward propagation. To minimize server-side training costs, we integrate Low-Rank Adaptation (LoRA) into the server-side layers. A pipeline parallelism strategy further optimizes training efficiency by decoupling device-server computations and leveraging decomposed backward propagation. Experiments on GPT-2 with the MRPC, MNLI-matched, and SST-2 datasets demonstrate that SplitFrozen outperforms FedLoRA and SplitLoRA by 69.4% model accuracy under extremely imbalanced data, while reducing up to 86.8% device-side computations and 50.2% total training time. Experiments also validate the scalability of SplitFrozen on content generation task using Llama-3.2 model on GSM8K dataset.