Memory-Efficient Federated Fine-Tuning of Large Language Models via Layer Pruning

📅 2025-08-24

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Federated fine-tuning of large language models (LLMs) incurs prohibitive memory overhead, hindering participation from resource-constrained edge devices. Method: We propose an efficient federated fine-tuning framework tailored for edge deployment. It introduces a novel macro-level, functionality-driven layer orchestration and a micro-level, importance-aware fine-grained pruning strategy—enabling the first instance of independent, modular pruning of attention heads and feed-forward network components. The framework integrates grouped layer pruning, functionality-oriented compression, and dynamic importance assessment, while supporting privacy-preserving personalized submodel construction within the federated learning paradigm. Contribution/Results: Experiments demonstrate a 75% reduction in peak memory consumption and up to 1.98% average accuracy improvement over state-of-the-art methods, establishing new performance–efficiency trade-offs for on-device federated LLM adaptation.

Technology Category

Application Category

📝 Abstract

Federated fine-tuning enables privacy-preserving Large Language Model (LLM) adaptation, but its high memory cost limits participation from resource-constrained devices. We propose FedPruner, an innovative federated fine-tuning paradigm that tackles this via intelligent layer pruning. FedPruner flexibly prunes the global model, creating personalized submodels based on device memory constraints. It employs a macro-micro synergistic pruning framework: a macro-level functionality-driven layer orchestration mechanism groups layers, while a micro-level importance-aware layer selection strategy prunes within groups to build device-specific submodels. We further introduce a fine-grained variant that independently prunes Multi-Head Attention and Feed-Forward Network components to precisely preserve critical architectural elements. Extensive experimental results demonstrate that FedPruner significantly outperforms state-of-the-art approaches, achieving up to a 1.98% improvement in average model accuracy while reducing peak memory usage by 75%.

Problem

Research questions and friction points this paper is trying to address.

Reduces memory cost for federated LLM fine-tuning

Enables resource-constrained device participation in training

Prunes model layers while maintaining accuracy performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer pruning for memory efficiency

Macro-micro synergistic pruning framework

Fine-grained component-level pruning strategy

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions

2024-09-24arXiv.orgCitations: 16

Anthropic

$350,000—$850,000 USD

San Francisco, CA, USA

Authors to Follow