Low-latency Federated LLM Fine-tuning Over Wireless Networks

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of high fine-tuning latency and communication overhead in wireless federated learning caused by the massive parameter count of large language models, particularly under resource-constrained and heterogeneous client conditions. To mitigate this, the authors propose the Joint Client-specific Pruning and Bandwidth Allocation (JCPBA) framework, which uniquely integrates client-aware dynamic pruning with wireless bandwidth allocation through joint optimization to minimize end-to-end fine-tuning latency. The framework employs a block coordinate descent algorithm to solve the coupled optimization problem of pruning ratios and bandwidth allocation, effectively unifying model compression with communication scheduling. Experimental results on the Yahoo Answers and GSM8K datasets demonstrate that JCPBA significantly reduces fine-tuning time and resource consumption while achieving comparable or lower test loss.

Technology Category

Application Category

📝 Abstract
Recently, federated large language models (LLMs) have drawn significant attention thanks to coupled capabilities of LLMs and federated learning (FL) that address privacy concerns in collaborative fine-tuning. However, due to large-scale parameters of LLMs, existing federated LLM fine-tuning frameworks incur significant challenges in resource-constrained clients characterized by heterogeneous computing capabilities and random wireless channels. To address this issue, we propose a joint client-specific pruning and bandwidth allocation (JCPBA) framework for federated LLMs to improve the fine-tuning efficiency over the wireless networks. Specifically, we formulate a fine-tuning latency minimization problem by jointly optimizing pruning rates and bandwidth allocations. Furthermore, we solve this optimization problem using a block coordinate descent method. Extensive experiments on the datasets of Yahoo Answers and GSM8K demonstrate that the proposed framework significantly reduces wall-clock fine-tuning time compared with state-of-the-art baselines and gains equal or lower test loss at the cost of lower computation and communication overhead.
Problem

Research questions and friction points this paper is trying to address.

federated learning
large language models
low-latency
wireless networks
resource-constrained clients
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated LLM
Client-specific pruning
Bandwidth allocation
Low-latency fine-tuning
Wireless networks
🔎 Similar Papers
No similar papers found.
Z
Zhiwen Pang
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
K
Kang Wei
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
L
Long Shi
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Zhe Wang
Zhe Wang
Nanjing University of Science and Technology
UAV CommunicationsCognitive RadioReinforcement LearningGame Theory
J
Jun Li
School of Information Science and Engineering, Southeast University, Nanjing 211189, China
F
Feng Shu
School of Information and Communication Engineering, Hainan University, Haikou 570228, China