🤖 AI Summary
To address the high communication overhead, structural redundancy of LoRA parameters under network heterogeneity, and aggregation conflicts in federated fine-tuning of large language models (LLMs), this paper proposes the Sparsify-Reconstruct-Decompose (SRD) framework. First, LoRA updates are sparsified via importance-aware pruning; second, sparse updates are reconstructed in the full-rank space prior to aggregation, mitigating heterogeneity-induced conflicts; third, aggregated updates are broadcast efficiently via low-rank decomposition. SRD is the first framework to systematically disentangle LoRA’s structural redundancy, establishing a symmetric communication cycle. A lightweight variant, FedSRD-e, is further introduced to reduce local computational cost. Extensive experiments across ten benchmarks demonstrate up to 90% reduction in communication cost, alongside significantly improved convergence and generalization under heterogeneous data distributions.
📝 Abstract
The current paradigm of training large language models (LLMs) on publicly available Web data is becoming unsustainable, with high-quality data sources in specialized domains nearing exhaustion. Federated Learning (FL) emerges as a practical solution for the next generation of AI on a decentralized Web, enabling privacy-preserving collaborative fine-tuning by leveraging private data distributed across a global client base. While Low-Rank Adaptation (LoRA) is the standard for efficient fine-tuning, its application in federated settings presents a critical challenge: communication overhead remains a significant bottleneck across the Web's heterogeneous network conditions. The structural redundancy within LoRA parameters not only incurs a heavy communication burden but also introduces conflicts when aggregating client updates. To address this, we propose FedSRD, a Sparsify-Reconstruct-Decompose framework designed for communication-efficient FL. We first introduce an importance-aware sparsification method that preserves the structural integrity of LoRA updates to reduce the uploaded parameter count. The server then reconstructs and aggregates these updates in a full-rank space to mitigate conflicts. Finally, it decomposes the global update into a sparse low-rank format for broadcast, ensuring a symmetrically efficient cycle. We also propose an efficient variant, FedSRD-e, to reduce computational overhead. Experimental results on 10 benchmarks demonstrate that our framework significantly reduces communication costs by up to 90% while even improving model performance on heterogeneous client data.