Split Fine-Tuning for Large Language Models in Wireless Networks

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of fine-tuning large language models (LLMs) on memory- and compute-constrained mobile devices in wireless networks—exacerbated by high communication overhead—this paper proposes a Split Fine-Tuning (SFT) framework that distributes LLM layers across edge servers and end devices for collaborative, distributed fine-tuning. We introduce a novel dual-scale joint optimization architecture: first, a gradient compression scheme integrating gradient sparsification, stochastic quantization, and lossless encoding; second, a constrained optimization method combining the augmented Lagrangian approach with sequential quadratic programming (SQP) to jointly minimize latency, preserve model accuracy, and respect device memory limits under heterogeneous hardware and dynamic channel conditions. Experimental results demonstrate that SFT reduces fine-tuning latency by 80.2% and communication cost by 93.6% compared to state-of-the-art methods, while strictly satisfying stringent on-device memory constraints and target accuracy requirements.

Technology Category

Application Category

📝 Abstract
Fine-tuning is the process of adapting the pre-trained large language models (LLMs) for downstream tasks. Due to substantial parameters, fine-tuning LLMs on mobile devices demands considerable memory resources, and suffers from high communication overhead and long fine-tuning delay. In this paper, we propose an efficient LLM fine-tuning scheme in wireless networks, named Split Fine-Tuning (SFT), which can accommodate LLM fine-tuning on mobile devices. Specifically, an LLM is split into a server-side part on the edge server and a device-side part on the mobile device to satisfy the device-side memory constraint. All devices share a server-side model and perform parallel fine-tuning to reduce fine-tuning delay. In addition, to reduce significant communication overhead incurred by data exchange between devices and the edge server, we propose a data compression scheme by jointly leveraging sparsification, stochastic quantization, and lossless encoding methods. Furthermore, we formulate a fine-tuning delay minimization problem under accuracy and memory constraints, taking device heterogeneity and channel dynamics into account. To solve the problem, the nonlinear mixed-integer problem is decoupled into two subproblems in different timescales. The two-timescale resource management algorithm is proposed to jointly optimize the compression rate and transformer block allocation in the large timescale using the augmented Lagrangian method, and determine spectrum resource allocation in the small timescale via sequential quadratic programming. Extensive simulation results demonstrate that the proposed scheme can reduce the fine-tuning delay by up to 80.2% and communication overhead by 93.6% compared to state-of-the-art benchmarks, while satisfying device-side memory and model accuracy constraints.
Problem

Research questions and friction points this paper is trying to address.

Wireless Networks
Large Language Model Fine-tuning
Resource-constrained Devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Split Fine-Tuning
data pruning
random quantization
🔎 Similar Papers
S
Songge Zhang
School of Electronic and Computer Engineering, Peking University, Shenzhen, 518000, China, and also with the Frontier Research Center, Pengcheng Laboratory, Shenzhen, 518055, China
G
Guoliang Cheng
Frontier Research Center, Pengcheng Laboratory, Shenzhen, 518055, China
X
Xinyu Huang
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, N2L 3G1, Canada
Z
Zuguang Li
Frontier Research Center, Pengcheng Laboratory, Shenzhen, 518055, China
W
Wen Wu
Frontier Research Center, Pengcheng Laboratory, Shenzhen, 518055, China
Lingyang Song
Lingyang Song
Peking University & Peng Cheng Laboratory, China
Wireless CommunicationsMobile ComputingMachine LearningGame Theory
Xuemin (Sherman) Shen
Xuemin (Sherman) Shen
University Professor, University of Waterloo
Wireless networkingMACNetwork securityVANET