Exploring Federated Pruning for Large Language Models

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Deploying large language models (LLMs) on resource-constrained devices in privacy-sensitive scenarios is hindered by existing compression methods—which rely on public calibration data and thus risk privacy leakage. Method: We propose FedPrLLM, the first privacy-preserving LLM compression framework that integrates structured pruning into federated learning. Each client independently generates layer-wise pruning masks using only local private data; global model compression is achieved via mask aggregation—requiring neither raw data nor calibration samples to be shared. Contribution/Results: We identify an optimal configuration: single-shot pruning, cross-layer importance comparison, and no weight rescaling. Extensive experiments across multiple LLMs and datasets demonstrate that FedPrLLM significantly outperforms baselines, achieving controlled accuracy degradation and low communication overhead. The implementation is open-sourced.

Technology Category

Application Category

📝 Abstract

LLM pruning has emerged as a promising technology for compressing LLMs, enabling their deployment on resource-limited devices. However, current methodologies typically require access to public calibration samples, which can be challenging to obtain in privacy-sensitive domains. To address this issue, we introduce FedPrLLM, a comprehensive federated pruning framework designed for the privacy-preserving compression of LLMs. In FedPrLLM, each client only needs to calculate a pruning mask matrix based on its local calibration data and share it with the server to prune the global model. This approach allows for collaborative pruning of the global model with the knowledge of each client while maintaining local data privacy. Additionally, we conduct extensive experiments to explore various possibilities within the FedPrLLM framework, including different comparison groups, pruning strategies, and the decision to scale weights. Our extensive evaluation reveals that one-shot pruning with layer comparison and no weight scaling is the optimal choice within the FedPrLLM framework. We hope our work will help guide future efforts in pruning LLMs in privacy-sensitive fields. Our code is available at https://github.com/Pengxin-Guo/FedPrLLM.

Problem

Research questions and friction points this paper is trying to address.

Compressing LLMs for resource-limited devices

Pruning LLMs without public calibration samples

Preserving data privacy in federated pruning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated pruning for privacy-preserving LLM compression

Clients share pruning masks, not local data

Optimal one-shot pruning with layer comparison

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions