Communication-Aware Knowledge Distillation for Federated LLM Fine-Tuning over Wireless Networks

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the high communication overhead and heterogeneous architecture adaptation challenges in federated fine-tuning of large language models (LLMs) over wireless networks, this paper proposes a communication-aware knowledge distillation framework. The method integrates three key components: (1) communication-state-driven adaptive Top-k logits sparsification, (2) dynamic logits aggregation—eliminating zero-padding noise—and (3) LoRA-based hidden-layer projection for efficient knowledge transfer under bandwidth constraints. Unlike conventional parameter-sharing or standard federated distillation approaches, our framework significantly alleviates the burden of high-dimensional logits transmission while preserving model consistency and generalization across heterogeneous clients. Experimental results demonstrate that the proposed method reduces communication cost by approximately 50% at comparable model performance, achieving both practical deployability and scalability in resource-constrained wireless federated learning settings.

Technology Category

Application Category

📝 Abstract

Federated learning (FL) for large language models (LLMs) offers a privacy-preserving scheme, enabling clients to collaboratively fine-tune locally deployed LLMs or smaller language models (SLMs) without exchanging raw data. While parameter-sharing methods in traditional FL models solves number of technical challenges, they still incur high communication overhead and struggle with adapting to heterogeneous model architectures. Federated distillation, a framework for mutual knowledge transfer via shared logits, typically offers lower communication overhead than parameter-sharing methods. However, transmitting logits from LLMs remains challenging for bandwidth-limited clients due to their high dimensionality. In this work, we focus on a federated LLM distillation with efficient communication overhead. To achieve this, we first propose an adaptive Top-k logit selection mechanism, dynamically sparsifying logits according to real-time communication conditions. Then to tackle the dimensional inconsistency introduced by the adaptive sparsification, we design an adaptive logits aggregation scheme, effectively alleviating the artificial and uninformative inputs introduced by conventional zero-padding methods. Finally, to enhance the distillation effect, we incorporate LoRA-adapted hidden-layer projection from LLM into the distillation loss, reducing the communication overhead further while providing richer representation. Experimental results demonstrate that our scheme achieves superior performance compared to baseline methods while effectively reducing communication overhead by approximately 50%.

Problem

Research questions and friction points this paper is trying to address.

Reducing communication overhead in federated LLM distillation

Adapting to heterogeneous model architectures in federated learning

Handling high-dimensional logit transmission over bandwidth-limited networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Top-k logit selection mechanism

Adaptive logits aggregation scheme

LoRA-adapted hidden-layer projection integration

🔎 Similar Papers

Personalized Wireless Federated Learning for Large Language Models