Restoring Pruned Large Language Models via Lost Component Compensation

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the substantial performance degradation of pruned large language models (LLMs) and the incompatibility of existing parameter-efficient fine-tuning (PEFT) methods with their sparse architectures, this paper proposes RestoreLCC—the first plug-in recovery framework grounded in attention activation difference analysis. Its core innovation lies in precisely identifying information-loss-critical attention heads by analyzing differential attention activations before and after pruning, then localizing optimal compensation positions via activation editing and contrastive probing, and finally reconstructing lost information through lightweight component injection—without introducing any inference overhead. RestoreLCC preserves model sparsity entirely and supports diverse pruning structures. Experiments demonstrate that RestoreLCC consistently outperforms state-of-the-art recovery methods across both general and domain-specific tasks, achieving average accuracy gains of 2.1–4.7 percentage points.

Technology Category

Application Category

📝 Abstract
Pruning is a widely used technique to reduce the size and inference cost of large language models (LLMs), but it often causes performance degradation. To mitigate this, existing restoration methods typically employ parameter-efficient fine-tuning (PEFT), such as LoRA, to recover the pruned model's performance. However, most PEFT methods are designed for dense models and overlook the distinct properties of pruned models, often resulting in suboptimal recovery. In this work, we propose a targeted restoration strategy for pruned models that restores performance while preserving their low cost and high efficiency. We observe that pruning-induced information loss is reflected in attention activations, and selectively reintroducing components of this information can significantly recover model performance. Based on this insight, we introduce RestoreLCC (Restoring Pruned LLMs via Lost Component Compensation), a plug-and-play method that contrastively probes critical attention heads via activation editing, extracts lost components from activation differences, and finally injects them back into the corresponding pruned heads for compensation and recovery. RestoreLCC is compatible with structured, semi-structured, and unstructured pruning schemes. Extensive experiments demonstrate that RestoreLCC consistently outperforms state-of-the-art baselines in both general and task-specific performance recovery, without compromising the sparsity or inference efficiency of pruned models.
Problem

Research questions and friction points this paper is trying to address.

Restoring performance of pruned large language models efficiently
Addressing information loss in pruned models via activation analysis
Compensating lost components without sacrificing sparsity or efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compensates lost components via activation editing
Probes critical attention heads contrastively
Injects extracted components into pruned heads
🔎 Similar Papers
No similar papers found.
Z
Zijian Feng
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
Hanzhang Zhou
Hanzhang Zhou
Nanyang Technological University
Large Language ModelsMechanistic InterpretabilityNatural Language Processing
Zixiao Zhu
Zixiao Zhu
Nanyang Technological University
artificial intelligence
T
Tianjiao Li
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
J
Jia Jim Deryl Chua
Home Team Science and Technology Agency (HTX), Singapore
L
Lee Onn Mak
Home Team Science and Technology Agency (HTX), Singapore
G
Gee Wah Ng
Home Team Science and Technology Agency (HTX), Singapore
Kezhi Mao
Kezhi Mao
Nanyang Technological University
machine learningnatural language processingimage processinginformation fusion