🤖 AI Summary
Federated fine-tuning of large language models (LLMs) incurs prohibitive memory overhead, hindering participation from resource-constrained edge devices.
Method: We propose an efficient federated fine-tuning framework tailored for edge deployment. It introduces a novel macro-level, functionality-driven layer orchestration and a micro-level, importance-aware fine-grained pruning strategy—enabling the first instance of independent, modular pruning of attention heads and feed-forward network components. The framework integrates grouped layer pruning, functionality-oriented compression, and dynamic importance assessment, while supporting privacy-preserving personalized submodel construction within the federated learning paradigm.
Contribution/Results: Experiments demonstrate a 75% reduction in peak memory consumption and up to 1.98% average accuracy improvement over state-of-the-art methods, establishing new performance–efficiency trade-offs for on-device federated LLM adaptation.
📝 Abstract
Federated fine-tuning enables privacy-preserving Large Language Model (LLM) adaptation, but its high memory cost limits participation from resource-constrained devices. We propose FedPruner, an innovative federated fine-tuning paradigm that tackles this via intelligent layer pruning. FedPruner flexibly prunes the global model, creating personalized submodels based on device memory constraints. It employs a macro-micro synergistic pruning framework: a macro-level functionality-driven layer orchestration mechanism groups layers, while a micro-level importance-aware layer selection strategy prunes within groups to build device-specific submodels. We further introduce a fine-grained variant that independently prunes Multi-Head Attention and Feed-Forward Network components to precisely preserve critical architectural elements. Extensive experimental results demonstrate that FedPruner significantly outperforms state-of-the-art approaches, achieving up to a 1.98% improvement in average model accuracy while reducing peak memory usage by 75%.