ARA: Adaptive Rank Allocation for Efficient Large Language Model SVD Compression

๐Ÿ“… 2025-10-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing heuristic or mask-based training approaches for singular value decomposition (SVD)-based low-rank compression of large language models (LLMs) suffer from three key limitations: (1) reliance on local search strategies, (2) inability to explicitly model the relationship between singular value spectra and trainable parameters, and (3) neglect of non-smoothness in the gain function at compression ratio 1, leading to optimization traps. To address these issues, this paper proposes an adaptive rank allocation framework. It introduces a differentiable structured masking mechanism that explicitly maps rank retention to parameter updates, and incorporates a smoothed auxiliary loss to circumvent non-smooth critical points and avoid suboptimal local minima. Evaluated on LLaMA2-7B under 80% compression, the method achieves a WikiText2 perplexity of 6.42 (a reduction of 1.96), and improves zero-shot task accuracy by 9.72 percentage points on averageโ€”setting a new state-of-the-art for SVD-based LLM compression.

Technology Category

Application Category

๐Ÿ“ Abstract
In the field of large language model (LLM) compression, singular value decomposition (SVD) is a widely studied and adopted low-rank decomposition technique. Since SVD operates exclusively on linear modules, and these modules in LLMs are separated by nonlinear components, SVD can only be applied independently to each linear module. Under a global compression ratio constraint, determining the appropriate rank for different linear modules becomes a critical problem. Existing approaches, such as heuristic algorithms and mask-based training, have made progress in addressing this challenge. However, these methods still suffer from several limitations: heuristic algorithms explore the solution space within restricted regions, while mask-based training struggles to efficiently capture the relationship between singular value spectra and trainable parameters. More importantly, current methods overlook the key property that the gain function is non-smooth at a compression ratio of 1, which often leads the training process to suboptimal local minima. To address these issues, we propose an Adaptive Rank Allocation (ARA) method. Specifically, (1) ARA introduces a dedicated mask design that enables efficient mapping and updating between retained ranks and trainable parameters; and (2) it employs an additional loss function to guide parameter selection toward globally optimal solutions. Experimental results demonstrate that ARA achieves state-of-the-art performance. On the LLaMA2-7B model with a 80% compression ratio, ARA reduces perplexity on WikiText2 from 8.38 to 6.42 and improves average zero-shot task accuracy by 9.72 percentage points compared with uniform compression. These results highlight the effectiveness of our method for rank allocation in SVD-based LLM compression.
Problem

Research questions and friction points this paper is trying to address.

Optimizing rank allocation for linear modules in LLM SVD compression
Overcoming limitations of heuristic and mask-based rank selection methods
Addressing non-smooth gain function issues in compression ratio optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Rank Allocation method for SVD compression
Mask design mapping ranks to trainable parameters
Additional loss function guides global optimization
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Lin Xv
Shanghai Jiao Tong University
J
Jingsheng Gao
Shanghai Jiao Tong University
Xian Gao
Xian Gao
Shanghai Jiao Tong University
LLMMulti-modalAI for Education
T
Ting Liu
Shanghai Jiao Tong University
Y
Yuzhuo Fu
Shanghai Jiao Tong University