SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

📅 2024-03-12
🏛️ arXiv.org
📈 Citations: 36
Influential: 8
📄 PDF
🤖 AI Summary
To address the deployment challenges posed by the enormous parameter counts of large language models (LLMs), existing singular value decomposition (SVD)-based compression methods suffer from two key limitations: substantial truncation error and static, inflexible weight matrices post-compression. This paper proposes a truncation-aware SVD compression framework. First, it introduces data whitening to establish an explicit mapping between singular values and reconstruction error, enabling adaptive selection of the optimal truncation rank. Second, it performs sequential low-rank parameter fine-tuning after SVD truncation to dynamically compensate for accuracy degradation. This work is the first to jointly integrate truncation awareness with post-training low-rank adaptation, thereby breaking away from conventional static SVD compression paradigms. Experiments across 10 benchmark datasets and 7 LLMs spanning diverse architectures and scales demonstrate that, under high compression ratios, the proposed method reduces average accuracy loss by 38% compared to state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
The advancements in Large Language Models (LLMs) have been hindered by their substantial sizes, which necessitates LLM compression methods for practical deployment. Singular Value Decomposition (SVD) offers a promising solution for LLM compression. However, state-of-the-art SVD-based LLM compression methods have two key limitations: truncating smaller singular values may lead to higher compression loss, and the lack of update on the compressed weights after SVD truncation. In this work, we propose SVD-LLM, a SVD-based post-training LLM compression method that addresses the limitations of existing methods. SVD-LLM incorporates a truncation-aware data whitening technique to ensure a direct mapping between singular values and compression loss. Moreover, SVD-LLM adopts a parameter update with sequential low-rank approximation to compensate for the accuracy degradation after SVD compression. We evaluate SVD-LLM on 10 datasets and seven models from three different LLM families at three different scales. Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios. Our code is available at https://github.com/AIoT-MLSys-Lab/SVD-LLM
Problem

Research questions and friction points this paper is trying to address.

Addresses high compression loss in SVD-based LLM compression
Improves accuracy post-SVD truncation with parameter updates
Enhances LLM compression efficiency at high compression ratios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Truncation-aware data whitening technique
Sequential low-rank approximation update
SVD-based post-training LLM compression
🔎 Similar Papers
No similar papers found.
X
Xin Wang
The Ohio State University
Y
Yu Zheng
Michigan State University
Zhongwei Wan
Zhongwei Wan
The Ohio State University, PhD student
LLMMultimodalNLP
M
Mi Zhang
The Ohio State University