Memory-Efficient Federated Fine-Tuning of Large Language Models via Layer Pruning

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Federated fine-tuning of large language models (LLMs) incurs prohibitive memory overhead, hindering participation from resource-constrained edge devices. Method: We propose an efficient federated fine-tuning framework tailored for edge deployment. It introduces a novel macro-level, functionality-driven layer orchestration and a micro-level, importance-aware fine-grained pruning strategy—enabling the first instance of independent, modular pruning of attention heads and feed-forward network components. The framework integrates grouped layer pruning, functionality-oriented compression, and dynamic importance assessment, while supporting privacy-preserving personalized submodel construction within the federated learning paradigm. Contribution/Results: Experiments demonstrate a 75% reduction in peak memory consumption and up to 1.98% average accuracy improvement over state-of-the-art methods, establishing new performance–efficiency trade-offs for on-device federated LLM adaptation.

Technology Category

Application Category

📝 Abstract
Federated fine-tuning enables privacy-preserving Large Language Model (LLM) adaptation, but its high memory cost limits participation from resource-constrained devices. We propose FedPruner, an innovative federated fine-tuning paradigm that tackles this via intelligent layer pruning. FedPruner flexibly prunes the global model, creating personalized submodels based on device memory constraints. It employs a macro-micro synergistic pruning framework: a macro-level functionality-driven layer orchestration mechanism groups layers, while a micro-level importance-aware layer selection strategy prunes within groups to build device-specific submodels. We further introduce a fine-grained variant that independently prunes Multi-Head Attention and Feed-Forward Network components to precisely preserve critical architectural elements. Extensive experimental results demonstrate that FedPruner significantly outperforms state-of-the-art approaches, achieving up to a 1.98% improvement in average model accuracy while reducing peak memory usage by 75%.
Problem

Research questions and friction points this paper is trying to address.

Reduces memory cost for federated LLM fine-tuning
Enables resource-constrained device participation in training
Prunes model layers while maintaining accuracy performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer pruning for memory efficiency
Macro-micro synergistic pruning framework
Fine-grained component-level pruning strategy
🔎 Similar Papers
No similar papers found.