🤖 AI Summary
Existing large language model (LLM) pruning methods suffer from severe cross-task generalization bottlenecks—particularly catastrophic performance degradation on sentiment classification—and exhibit high sensitivity to calibration set selection.
Method: We propose the first neuron-level semantic attribution framework for LLMs, integrating gradient- and activation-based signals to enable fine-grained, interpretable semantic characterization of unpruned neurons; we further design a neuron importance recalibration mechanism to enhance pruning robustness.
Contribution/Results: Comprehensive evaluation across 24 datasets and four task categories demonstrates that our method substantially mitigates sentiment classification accuracy loss (average improvement of 12.7%) and enables semantic mapping of over 90% of critical neurons to concrete linguistic concepts. This work establishes, for the first time, an interpretable link between LLM pruning efficacy and the underlying semantic functionality of individual neurons.
📝 Abstract
Model pruning technique is vital for accelerating large language models by reducing their size and computational requirements. However, the generalizability of existing pruning methods across diverse datasets and tasks remains unclear. Thus, we conduct extensive evaluations on 24 datasets and 4 tasks using popular pruning methods. Based on these evaluations, we find and then investigate that calibration set greatly affect the performance of pruning methods. In addition, we surprisingly find a significant performance drop of existing pruning methods in sentiment classification tasks. To understand the link between performance drop and pruned neurons, we propose Neuron Semantic Attribution, which learns to associate each neuron with specific semantics. This method first makes the unpruned neurons of LLMs explainable.