🤖 AI Summary
To address the deployment challenges posed by the excessive parameter count of large language models (LLMs), this paper proposes an adaptive singular value decomposition (SVD) compression method tailored to matrix heterogeneity. Unlike conventional approaches that apply uniform SVD across entire weight matrices, our method performs column-wise error analysis to identify and preserve high-error columns, while applying SVD only to low-error columns. Compression intensity is dynamically allocated across layers via a data-driven thresholding mechanism and non-uniform rank assignment, jointly optimizing reconstruction fidelity and compression efficiency. Experiments demonstrate that, at identical compression ratios, our method achieves significantly lower perplexity and higher zero-shot task accuracy across multiple mainstream LLMs compared to existing SVD-based baselines, confirming its effectiveness and generalizability.
📝 Abstract
The rapid advancement of Large Language Models (LLMs) faces a critical bottleneck in their immense size, necessitating efficient compression techniques. While Singular Value Decomposition (SVD) is a promising approach, existing SVD-based methods treat the entire parameter matrix uniformly, overlooking that SVD approximation errors vary significantly across different matrix parts, which often leads to suboptimal compression. To address this, we propose extbf{C}olumn- extbf{P}reserving extbf{S}ingular extbf{V}alue extbf{D}ecomposition (CPSVD), a novel method that refines SVD-based LLM compression by intelligently segmenting the parameter matrix. Unlike traditional SVD, CPSVD identifies and directly preserves matrix columns with high decomposition errors, applying SVD only to columns with low decomposition errors, while precisely determining the optimal balance point between these two strategies to minimize error. Furthermore, leveraging the inherent heterogeneity in decomposition errors across different matrices within an LLM, CPSVD adaptively allocates non-uniform compression rates to modules within that layer, while adhering to a target layer-wise compression ratio, thereby further enhancing compression performance. Extensive experiments demonstrate that CPSVD consistently outperforms state-of-the-art SVD-based LLM compression methods, achieving lower perplexity and higher accuracy on zero-shot tasks.