🤖 AI Summary
Existing SVD-based compression methods for large language models (LLMs) often neglect the protection of critical singular components, leading to significant performance degradation. To address this, we propose a dual-importance preservation mechanism: (1) globally, dynamically allocating layer-wise compression ratios to balance computational burden across layers; and (2) locally, enhancing retention of salient singular vectors via channel-weighted data whitening. Our approach preserves the hardware compatibility and theoretical interpretability inherent to SVD while substantially improving compression robustness. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art SVD compression baselines across multiple benchmarks. Notably, it maintains strong performance even under extreme compression—e.g., retaining only 20% of the original singular values—thereby establishing a new paradigm for efficient and reliable LLM deployment.
📝 Abstract
The ever-increasing computational demands and deployment costs of large language models (LLMs) have spurred numerous compressing methods. Compared to quantization and unstructured pruning, SVD compression offers superior hardware compatibility and theoretical guarantees. However, existing SVD-based methods focus on the overall discrepancy between the original and compressed matrices while overlooking the protection of critical components within the matrix, which leads to inferior performance in the compressed models. This paper proposes a dual-level importance protection mechanism to enhance SVD-based compression methods: (1) local importance protection: preserving the most critical singular vectors within each weight matrix through channel-weighted data whitening; and (2) global importance protection: enabling less important layers to bear a greater portion of the compression burden through either a heuristic or optimization-based approach, thereby minimizing the impact of compression on critical layers. Extensive experiments demonstrate that DipSVD outperforms existing SVD-based compression approaches across multiple benchmarks, achieving superior model performance especially at high model compression ratios.