Zero Sum SVD: Balancing Loss Sensitivity for Low Rank LLM Compression

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of optimally allocating truncation ranks across weight matrices in low-rank post-training compression of large language models under a global compression ratio constraint to minimize performance degradation. The authors propose a global singular component selection method based on activation whitening and first-order gradient-based loss estimation, which unifies the importance measurement of all matrices in a whitened coordinate system. By introducing a zero-sum rule, the method enables automatic heterogeneous rank allocation without explicit optimization. Furthermore, a single-step projected gradient correction is incorporated to enhance post-compression performance. Experiments demonstrate that the proposed approach consistently outperforms existing techniques across various model architectures and compression ratios, achieving significantly lower accuracy loss.

Technology Category

Application Category

📝 Abstract
Advances in large language models have driven strong performance across many tasks, but their memory and compute costs still hinder deployment. SVD-based compression reduces storage and can speed up inference via low-rank factors, yet performance depends on how rank is allocated under a global compression ratio. Prior methods often use homogeneous ranks for similarly sized matrices, despite large differences in loss sensitivity, or rely on expensive iterative pre-truncation optimization to determine per matrix ranks. We propose \textbf{Zero Sum SVD} (\textbf{ZS-SVD}), a post-training method that performs \emph{global} singular component selection using activation whitening and first-order calibration loss estimates in whitened coordinates. \textbf{ZS-SVD} prunes components across the whole model with a \textbf{zero sum} rule that keeps the cumulative predicted loss change near zero, automatically yielding heterogeneous ranks without solving a rank allocation optimization. Motivated by evidence that gradients near pretrained solutions exhibit low rank structure, we also introduce an optional lightweight correction that applies a \textbf{single} projected gradient update after truncation, followed by re-truncation. Extensive experiments across multiple LLM architectures show consistent gains across diverse benchmarks and compression ratios. Code is available at https://github.com/mint-vu/Zero-Sum-SVD
Problem

Research questions and friction points this paper is trying to address.

low-rank compression
SVD
loss sensitivity
rank allocation
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero Sum SVD
low-rank compression
activation whitening
heterogeneous rank allocation
post-training compression
Ali Abbasi
Ali Abbasi
CS Ph.D. Candidate, Vanderbilt University
Machine LearningComputer Vision
C
Chayne Thrash
Department of Computer Science, Vanderbilt University, TN, USA
H
Haoran Qin
Department of Computer Science, Vanderbilt University, TN, USA
S
Shansita Sharma
Department of Computer Science, Vanderbilt University, TN, USA
S
Sepehr Seifi
Department of Computer Science, Vanderbilt University, TN, USA
Soheil Kolouri
Soheil Kolouri
Computer Science, Vanderbilt University, Nashville, TN
Machine LearningOptimal TransportComputer Vision