Pruning General Large Language Models into Customized Expert Models

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying general-purpose large language models (LLMs) incurs substantial inference overhead, and existing pruning methods struggle to simultaneously preserve expert-level performance on specialized tasks and retain broad general capabilities. Method: We propose a three-dimensional customized pruning paradigm—structured pruning fine-grained along language, domain, and task dimensions—guided by neuron importance analysis, multi-dimensional semantic alignment evaluation, and cross-model-family generalization strategies, enabling zero-shot expert model generation without post-training. Contribution/Results: Evaluated across mainstream model families (Llama, Qwen, Phi), our approach achieves <0.8% average accuracy degradation on expert tasks while retaining over 96% of general capabilities—outperforming state-of-the-art pruning methods significantly. This marks the first zero-post-training framework for generating high-fidelity expert LLMs without compromising versatility.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have revolutionized natural language processing, yet their substantial model sizes often require substantial computational resources. To preserve computing resources and accelerate inference speed, it is crucial to prune redundant parameters, especially for experienced users who often need compact expert models tailored to specific downstream scenarios. However, most existing pruning methods focus on preserving the model's general capabilities, often requiring extensive post-training or suffering from degraded performance due to coarse-grained pruning. In this work, we design a $underline{Cus}$tom $underline{Prun}$ing method ($ exttt{Cus-Prun}$) to prune a large general model into a smaller lightweight expert model, which is positioned along the"language","domain"and"task"dimensions. By identifying and pruning irrelevant neurons of each dimension, $ exttt{Cus-Prun}$ creates expert models without any post-training. Our experiments demonstrate that $ exttt{Cus-Prun}$ consistently outperforms other methods, achieving minimal loss in both expert and general capabilities across various models from different model families and sizes.
Problem

Research questions and friction points this paper is trying to address.

Prune large general LLMs into compact expert models
Reduce computational resources without post-training
Maintain expert and general capabilities after pruning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prunes general LLMs into compact expert models
Identifies and removes irrelevant neurons efficiently
No post-training needed for expert model creation
🔎 Similar Papers
No similar papers found.