Exploring Federated Pruning for Large Language Models

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying large language models (LLMs) on resource-constrained devices in privacy-sensitive scenarios is hindered by existing compression methods—which rely on public calibration data and thus risk privacy leakage. Method: We propose FedPrLLM, the first privacy-preserving LLM compression framework that integrates structured pruning into federated learning. Each client independently generates layer-wise pruning masks using only local private data; global model compression is achieved via mask aggregation—requiring neither raw data nor calibration samples to be shared. Contribution/Results: We identify an optimal configuration: single-shot pruning, cross-layer importance comparison, and no weight rescaling. Extensive experiments across multiple LLMs and datasets demonstrate that FedPrLLM significantly outperforms baselines, achieving controlled accuracy degradation and low communication overhead. The implementation is open-sourced.

Technology Category

Application Category

📝 Abstract
LLM pruning has emerged as a promising technology for compressing LLMs, enabling their deployment on resource-limited devices. However, current methodologies typically require access to public calibration samples, which can be challenging to obtain in privacy-sensitive domains. To address this issue, we introduce FedPrLLM, a comprehensive federated pruning framework designed for the privacy-preserving compression of LLMs. In FedPrLLM, each client only needs to calculate a pruning mask matrix based on its local calibration data and share it with the server to prune the global model. This approach allows for collaborative pruning of the global model with the knowledge of each client while maintaining local data privacy. Additionally, we conduct extensive experiments to explore various possibilities within the FedPrLLM framework, including different comparison groups, pruning strategies, and the decision to scale weights. Our extensive evaluation reveals that one-shot pruning with layer comparison and no weight scaling is the optimal choice within the FedPrLLM framework. We hope our work will help guide future efforts in pruning LLMs in privacy-sensitive fields. Our code is available at https://github.com/Pengxin-Guo/FedPrLLM.
Problem

Research questions and friction points this paper is trying to address.

Compressing LLMs for resource-limited devices
Pruning LLMs without public calibration samples
Preserving data privacy in federated pruning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated pruning for privacy-preserving LLM compression
Clients share pruning masks, not local data
Optimal one-shot pruning with layer comparison
🔎 Similar Papers
No similar papers found.
P
Pengxin Guo
The University of Hong Kong
Y
Yinong Wang
The University of Hong Kong
W
Wei Li
Southern University of Science and Technology
Mengting Liu
Mengting Liu
Sun Yat-Sen University
NeuroimagingNeurodevelopmentArtificial IntelligenceCognitive Neuroscience
M
Ming Li
Guangming Laboratory
J
Jinkai Zheng
Hangzhou Dianzi University
Liangqiong Qu
Liangqiong Qu
The University of Hong Kong
Medical Image AnalysisImage SynthesisIllumination ModelingFederated Learning