SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression

📅 2025-03-16

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address severe compression distortion and performance degradation in large language model (LLM) deployment caused by excessive parameter counts, this paper proposes a theory-driven hierarchical singular value decomposition (SVD) compression method. Our approach features two key innovations: (1) an explicit truncation loss modeling framework that enables layer-wise adaptive rank selection, and (2) a loss-aware singular value pruning strategy that enhances truncation stability and reconstruction accuracy. Extensive experiments across five mainstream LLMs—including Llama and Qwen—and ten benchmark tasks demonstrate that our method consistently outperforms existing SVD-based compression techniques, achieving an average perplexity reduction of 3.2%. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Despite significant advancements, the practical deployment of Large Language Models (LLMs) is often hampered by their immense sizes, highlighting the need for effective compression techniques. Singular Value Decomposition (SVD) is a promising LLM compression technique. However, existing SVD-based compression methods fall short in reducing truncation losses, leading to less competitive performance in compressed models. In this work, we introduce SVD-LLM V2, a SVD-based LLM compression method that optimizes singular value truncation in SVD compression with two techniques. First, SVD-LLM V2 proposes to use theoretical truncation loss of weight matrices to assign a unique compression ratio to each weight matrix at different layers to accommodate weight redundancy heterogeneity. Second, SVD-LLM V2 proposes loss-optimized weight truncation to ensure that the truncated singular values result in a lower and more stable truncation loss in practice. We evaluate SVD-LLM V2 on ten datasets and five LLMs at various scales. Our results show SVD-LLM V2 outperforms state-of-the-art SVD-based LLM compression methods. Our code is available at https://github.com/AIoT-MLSys-Lab/SVD-LLM

Problem

Research questions and friction points this paper is trying to address.

Optimizing singular value truncation for LLM compression

Reducing truncation losses in SVD-based compression methods

Improving performance of compressed large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes singular value truncation for LLM compression

Assigns unique compression ratios per weight matrix

Implements loss-optimized weight truncation technique

🔎 Similar Papers

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression