Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models

📅 2024-09-10
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously ensuring privacy preservation, reducing communication overhead, and maintaining model accuracy in full-parameter fine-tuning of large language models (LLMs) within federated learning, this paper proposes the first first-order stochastic optimization framework tailored for decentralized settings. Methodologically, it introduces a shared randomness mechanism to coordinate local updates and integrates low-dimensional parameter projection with stochastic reconstruction to drastically compress communication without sacrificing accuracy. Theoretically, we establish convergence guarantees under standard assumptions. Empirically, our approach achieves up to 2.1× faster convergence and reduces communication costs by 76% across multiple benchmark tasks, while matching the model accuracy of centralized fine-tuning. This work establishes a new paradigm for privacy-preserving, communication-efficient, and scalable LLM fine-tuning in federated environments.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have become indispensable in numerous real-world applications. Unfortunately, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing methods often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To address these limitations, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (1) it employs widely applied first-order methods for efficient local updates; (2) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (3) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at https://github.com/allen4747/Ferret.
Problem

Research questions and friction points this paper is trying to address.

Federated full-parameter tuning for LLMs with privacy and efficiency
Reducing communication overhead without sacrificing model accuracy
Scalable first-order method for decentralized LLM fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

First-order method with shared randomness
Low-dimensional projection for efficiency
Effective full-parameter global aggregation
🔎 Similar Papers
No similar papers found.
Y
Yao Shu
Guangdong Lab of AI and Digital Economy (SZ), China
Wenyang Hu
Wenyang Hu
National University of Singapore
Machine Learning
See-Kiong Ng
See-Kiong Ng
School of Computing and Institute of Data Science, National University of Singapore
artificial intelligencenatural language processingdata miningsmart citiesbioinformatics
B
B. Low
Department of Computer Science, National University of Singapore
F
Fei Richard Yu
Carleton University, Canada