🤖 AI Summary
This work addresses the challenges of fine-tuning large language models in federated settings on resource-constrained edge devices, where high communication overhead, substantial memory consumption, and suboptimal performance of existing parameter-efficient methods hinder practical deployment. To overcome these limitations, the paper proposes FedKRSO, which introduces random subspace optimization to federated large language model fine-tuning for the first time. The server generates a shared K-Seed random low-dimensional subspace, within which clients perform gradient updates and transmit only accumulated parameter changes, thereby compressing communication. FedKRSO achieves performance comparable to full-parameter fine-tuning while significantly reducing both communication and memory costs. Extensive experiments across multiple federated scenarios on the GLUE benchmark demonstrate that FedKRSO substantially outperforms current parameter-efficient approaches, effectively breaking through their performance bottlenecks.
📝 Abstract
Fine-tuning is essential to adapt general-purpose large language models (LLMs) to domain-specific tasks. As a privacy-preserving framework to leverage decentralized data for collaborative model training, Federated Learning (FL) is gaining popularity in LLM fine-tuning, but remains challenging due to the high cost of transmitting full model parameters and computing full gradients on resource-constrained clients. While Parameter-Efficient Fine-Tuning (PEFT) methods are widely used in FL to reduce communication and memory costs, they often sacrifice model performance compared to FFT. This paper proposes FedKRSO (Federated $K$-Seed Random Subspace Optimization), a novel method that enables communication and memory efficient FFT of LLMs in federated settings. In FedKRSO, clients update the model within a shared set of random low-dimension subspaces generated by the server to save memory usage. Furthermore, instead of transmitting full model parameters in each FL round, clients send only the model update accumulators along the subspaces to the server, enabling efficient global model aggregation and dissemination. By using these strategies, FedKRSO can substantially reduce communication and memory overhead while overcoming the performance limitations of PEFT, closely approximating the performance of federated FFT. The convergence properties of FedKRSO are analyzed rigorously under general FL settings. Extensive experiments on the GLUE benchmark across diverse FL scenarios demonstrate that FedKRSO achieves both superior performance and low communication and memory overhead, paving the way towards on federated LLM fine-tuning at the resource-constrained edge.