Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the challenges of high communication overhead, memory pressure, and degraded aggregation performance caused by data heterogeneity in non-IID federated fine-tuning of large language models (LLMs), this paper proposes Meerkat, a sparse zeroth-order (ZO) optimization framework. Methodologically, Meerkat introduces three key innovations: (1) maintaining a fixed, extremely sparse subset of transferable parameters for frequent, low-cost synchronization; (2) discovering and leveraging the Gradient Inner Product (GradIP) phenomenon—via Meerkat-vp—to identify severely non-IID clients and enable dynamic early stopping; and (3) the first deep integration of transferable static sparsity with ZO optimization. We provide theoretical convergence guarantees. Experiments on multiple non-IID benchmarks demonstrate that Meerkat significantly outperforms full-parameter ZO and state-of-the-art sparse methods, achieving markedly improved communication efficiency and an average 12.7% gain in aggregation quality (measured by ROUGE-L).

Technology Category

Application Category

📝 Abstract

Federated Learning enables collaborative fine-tuning of Large Language Models (LLMs) across decentralized Non-Independent and Identically Distributed (Non-IID) clients, but such models' massive parameter sizes lead to significant memory and communication challenges. This work introduces Meerkat, a sparse zeroth-order optimization (ZO) method designed for federated LLM fine-tuning. By limiting fine-tuning to a transferable, static, extremely sparse subset of parameters, Meerkat achieves remarkable communication efficiency, enabling cost-effective high-frequency synchronization. With theoretical analysis and experiments, we show that this high-frequency communication effectively mitigates Non-IID data challenges and leads to superior performance compared to full-parameter ZO. Furthermore, experiment results show that Meerkat outperforms existing sparsity baselines with better performance at the same communication frequency. To further handle Non-IID drift, Meerkat leverages traceable local updates and forms a virtual path for each client. This virtual path mechanism reveals the GradIP phenomenon: the inner products between LLM pre-training gradients maintained by server and client gradients estimated via ZO converges for extreme Non-IID clients but oscillates for IID ones. This distinct behavior provides a signal for identifying clients with extreme data heterogeneity. Using this signal, Meerkat-vp is proposed to analyze GradIP trajectories to identify extreme Non-IID clients and applies early stopping to enhance aggregated model quality. Experiments confirm that Meerkat and Meerkat-vp significantly improve the efficiency and effectiveness of ZO federated LLM fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Mitigates Non-IID data drift in federated LLM fine-tuning

Reduces memory and communication costs via sparse ZO optimization

Identifies extreme Non-IID clients using GradIP signal analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse zeroth-order optimization for federated LLM fine-tuning

Transferable static sparse subset for communication efficiency

Virtual path mechanism to identify Non-IID clients

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions

2024-09-24arXiv.orgCitations: 16