DipSVD: Dual-importance Protected SVD for Efficient LLM Compression

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing SVD-based compression methods for large language models (LLMs) often neglect the protection of critical singular components, leading to significant performance degradation. To address this, we propose a dual-importance preservation mechanism: (1) globally, dynamically allocating layer-wise compression ratios to balance computational burden across layers; and (2) locally, enhancing retention of salient singular vectors via channel-weighted data whitening. Our approach preserves the hardware compatibility and theoretical interpretability inherent to SVD while substantially improving compression robustness. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art SVD compression baselines across multiple benchmarks. Notably, it maintains strong performance even under extreme compression—e.g., retaining only 20% of the original singular values—thereby establishing a new paradigm for efficient and reliable LLM deployment.

Technology Category

Application Category

📝 Abstract

The ever-increasing computational demands and deployment costs of large language models (LLMs) have spurred numerous compressing methods. Compared to quantization and unstructured pruning, SVD compression offers superior hardware compatibility and theoretical guarantees. However, existing SVD-based methods focus on the overall discrepancy between the original and compressed matrices while overlooking the protection of critical components within the matrix, which leads to inferior performance in the compressed models. This paper proposes a dual-level importance protection mechanism to enhance SVD-based compression methods: (1) local importance protection: preserving the most critical singular vectors within each weight matrix through channel-weighted data whitening; and (2) global importance protection: enabling less important layers to bear a greater portion of the compression burden through either a heuristic or optimization-based approach, thereby minimizing the impact of compression on critical layers. Extensive experiments demonstrate that DipSVD outperforms existing SVD-based compression approaches across multiple benchmarks, achieving superior model performance especially at high model compression ratios.

Problem

Research questions and friction points this paper is trying to address.

Protects critical matrix components in SVD compression

Balances compression burden across less important layers

Improves model performance at high compression ratios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Channel-weighted whitening for local importance protection

Heuristic or optimization-based global importance protection

Dual-level importance protection in SVD compression

🔎 Similar Papers

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression