Low-latency Federated LLM Fine-tuning Over Wireless Networks

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work addresses the challenge of high fine-tuning latency and communication overhead in wireless federated learning caused by the massive parameter count of large language models, particularly under resource-constrained and heterogeneous client conditions. To mitigate this, the authors propose the Joint Client-specific Pruning and Bandwidth Allocation (JCPBA) framework, which uniquely integrates client-aware dynamic pruning with wireless bandwidth allocation through joint optimization to minimize end-to-end fine-tuning latency. The framework employs a block coordinate descent algorithm to solve the coupled optimization problem of pruning ratios and bandwidth allocation, effectively unifying model compression with communication scheduling. Experimental results on the Yahoo Answers and GSM8K datasets demonstrate that JCPBA significantly reduces fine-tuning time and resource consumption while achieving comparable or lower test loss.

Technology Category

Application Category

📝 Abstract

Recently, federated large language models (LLMs) have drawn significant attention thanks to coupled capabilities of LLMs and federated learning (FL) that address privacy concerns in collaborative fine-tuning. However, due to large-scale parameters of LLMs, existing federated LLM fine-tuning frameworks incur significant challenges in resource-constrained clients characterized by heterogeneous computing capabilities and random wireless channels. To address this issue, we propose a joint client-specific pruning and bandwidth allocation (JCPBA) framework for federated LLMs to improve the fine-tuning efficiency over the wireless networks. Specifically, we formulate a fine-tuning latency minimization problem by jointly optimizing pruning rates and bandwidth allocations. Furthermore, we solve this optimization problem using a block coordinate descent method. Extensive experiments on the datasets of Yahoo Answers and GSM8K demonstrate that the proposed framework significantly reduces wall-clock fine-tuning time compared with state-of-the-art baselines and gains equal or lower test loss at the cost of lower computation and communication overhead.

Problem

Research questions and friction points this paper is trying to address.

federated learning

large language models

low-latency

wireless networks

resource-constrained clients

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated LLM

Client-specific pruning

Bandwidth allocation