Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation

πŸ“… 2025-07-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Layer pruning induces hidden-state magnitude mismatch, severely degrading LLM performance. This work is the first to identify this phenomenon and proposes a training-free, plug-and-play magnitude compensation pruning framework. It calibrates hidden-state magnitudes via offline weight rescaling and enhances pruning accuracy through an iterative pruning-compensation loop. The method is fully compatible with mainstream block importance metrics and incurs zero runtime overhead. Evaluated on LLaMA-3-8B, pruning five layers reduces perplexity by nearly 50% while maintaining question-answering accuracy at 93.19%β€”a 4.01% absolute improvement over the baselineβ€”and significantly outperforms existing training-free pruning approaches.

Technology Category

Application Category

πŸ“ Abstract
Layer pruning has emerged as a promising technique for compressing large language models (LLMs) while achieving acceleration proportional to the pruning ratio. In this work, we identify that removing any layer induces a significant magnitude gap in hidden states, resulting in substantial performance degradation. To address this issue, we propose Prune&Comp, a novel plug-and-play layer pruning scheme that leverages magnitude compensation to mitigate such gaps in a training-free manner. Specifically, we first estimate the magnitude gap caused by layer removal and then eliminate this gap by rescaling the remaining weights offline, with zero runtime overhead incurred. We further demonstrate the advantages of Prune&Comp through an iterative pruning strategy. When integrated with an iterative prune-and-compensate loop, Prune&Comp consistently enhances existing layer pruning metrics. For instance, when 5 layers of LLaMA-3-8B are pruned using the prevalent block influence metric, Prune&Comp nearly halves the perplexity and retains 93.19% of the original model's question-answering performance, outperforming the baseline by 4.01%.
Problem

Research questions and friction points this paper is trying to address.

Mitigate performance degradation from layer pruning in LLMs
Compensate magnitude gaps in hidden states without training
Enhance pruning metrics via iterative prune-and-compensate strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free magnitude compensation for pruned layers
Iterative pruning strategy for enhanced performance
Offline weight rescaling with zero runtime overhead
πŸ”Ž Similar Papers
No similar papers found.
Xinrui Chen
Xinrui Chen
Tsinghua University
Efficient Deep LearningComputer Vision
H
Hongxing Zhang
School of Information Science and Technology, Guangdong University of Foreign Studies
F
Fanyi Zeng
Shenzhen International Graduate School, Tsinghua University
Yongxian Wei
Yongxian Wei
Tsinghua University
Machine Learning
Y
Yizhi Wang
Shenzhen International Graduate School, Tsinghua University
Xitong Ling
Xitong Ling
Tsinghua University
AI4PathologyFoundation-ModelVision-Language-Model
Guanghao Li
Guanghao Li
Fudan University
Graphics
C
Chun Yuan
Shenzhen International Graduate School, Tsinghua University