Rethinking Layer Removal: Preserving Critical Components with Task-Aware Singular Value Decomposition

📅 2024-12-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address internal inconsistency and task performance degradation caused by layer-wise pruning in large language models (LLMs), this paper proposes Taco-SVD, a task-aware singular value decomposition framework. Methodologically, Taco-SVD innovatively couples gradient-based attribution with singular vector selection: it first identifies task-critical linear transformation directions via lightweight attribution mapping; then selects gradient-weighted singular vectors to preserve task-sensitive components; finally enforces inter-layer consistency constraints to ensure structural stability post-compression. The framework is architecture-agnostic and requires no fine-tuning for deployment. Experiments across diverse LLMs demonstrate that Taco-SVD consistently reduces perplexity and improves downstream task accuracy by 5.2% on average, while increasing computational overhead by less than 0.3%. Its key contributions include the first integration of gradient attribution into SVD direction selection, the introduction of layer-wise consistency regularization for compressed LLMs, and a plug-and-play compression method achieving high task fidelity with minimal computational cost.

Technology Category

Application Category

📝 Abstract
Layer removal has emerged as a promising approach for compressing large language models (LLMs) by leveraging redundancy within layers to reduce model size and accelerate inference. However, this technique often compromises internal consistency, leading to performance degradation and instability, with varying impacts across different model architectures. In this work, we propose Taco-SVD, a task-aware framework that retains task-critical singular value directions, preserving internal consistency while enabling efficient compression. Unlike direct layer removal, Taco-SVD preserves task-critical transformations to mitigate performance degradation. By leveraging gradient-based attribution methods, Taco-SVD aligns singular values with downstream task objectives. Extensive evaluations demonstrate that Taco-SVD outperforms existing methods in perplexity and task performance across different architectures while ensuring minimal computational overhead.
Problem

Research questions and friction points this paper is trying to address.

Model Compression
Performance Degradation
Task-aware Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Taco-SVD
Model Compression
Performance Preservation
🔎 Similar Papers
No similar papers found.
K
Kainan Liu
Ping An Technology (Shenzhen) Co., Ltd., China; The Hong Kong University of Science and Technology (Guangzhou)
Y
Yong Zhang
Ping An Technology (Shenzhen) Co., Ltd., China
Ning Cheng
Ning Cheng
TeraHop
Z
Zhitao Li
Ping An Technology (Shenzhen) Co., Ltd., China
Shaojun Wang
Shaojun Wang
Soochow University, TU/e, University of Strasbourg
NanophotonicsLight-matter interactionsNanofabrication
J
Jing Xiao
Ping An Technology (Shenzhen) Co., Ltd., China