FedSODA: Federated Fine-tuning of LLMs via Similarity Group Pruning and Orchestrated Distillation Alignment

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational, memory, and communication overheads of federated fine-tuning large language models (LLMs) on resource-constrained clients, this paper proposes FedSODA—a lightweight and efficient framework. Methodologically, FedSODA introduces three key components: (i) Similarity-based Group Pruning (SGP), which dynamically constructs lightweight submodels by clustering clients according to model similarity; (ii) Orthogonal Distillation Alignment (ODA), enabling gradient consistency and personalized adaptation without transmitting full model parameters; and (iii) integration of Quantized Low-Rank Adaptation (QLoRA) for further compression. Experimental results demonstrate that FedSODA reduces average communication cost by 70.6% and storage footprint by 75.6% compared to baseline methods, while improving task accuracy by 3.1%. Crucially, it achieves these gains while preserving data privacy and client personalization, thereby overcoming the bottleneck of full-model fine-tuning in federated LLM learning.

Technology Category

Application Category

📝 Abstract
Federated fine-tuning (FFT) of large language models (LLMs) has recently emerged as a promising solution to enable domain-specific adaptation while preserving data privacy. Despite its benefits, FFT on resource-constrained clients relies on the high computational and memory demands of full-model fine-tuning, which limits the potential advancement. This paper presents FedSODA, a resource-efficient FFT framework that enables clients to adapt LLMs without accessing or storing the full model. Specifically, we first propose a similarity group pruning (SGP) module, which prunes redundant layers from the full LLM while retaining the most critical layers to preserve the model performance. Moreover, we introduce an orchestrated distillation alignment (ODA) module to reduce gradient divergence between the sub-LLM and the full LLM during FFT. Through the use of the QLoRA, clients only need to deploy quantized sub-LLMs and fine-tune lightweight adapters, significantly reducing local resource requirements. We conduct extensive experiments on three open-source LLMs across a variety of downstream tasks. The experimental results demonstrate that FedSODA reduces communication overhead by an average of 70.6%, decreases storage usage by 75.6%, and improves task accuracy by 3.1%, making it highly suitable for practical FFT applications under resource constraints.
Problem

Research questions and friction points this paper is trying to address.

Reduces resource demands for federated LLM fine-tuning
Prunes redundant layers to maintain model performance
Minimizes gradient divergence between sub-LLM and full LLM
Innovation

Methods, ideas, or system contributions that make the work stand out.

Similarity group pruning for layer reduction
Orchestrated distillation alignment for gradient consistency
QLoRA for quantized sub-LLMs and adapters
🔎 Similar Papers
No similar papers found.