🤖 AI Summary
To address the high memory overhead and accuracy degradation of large language models (LLMs) when deployed on resource-constrained devices, this paper proposes adaComp, an adaptive singular value decomposition (SVD) compression framework. Moving beyond conventional uniform truncation strategies, adaComp introduces a dynamic error compensation mechanism and a layer-wise importance-aware compression ratio (adaCR) allocation scheme. Layer importance is quantified jointly via gradient sensitivity and singular value energy distribution, while alternating optimization of the U and Vᵀ matrices mitigates accumulated truncation errors. Extensive evaluation across diverse LLMs—including Llama and Qwen—demonstrates that adaComp achieves, on average, a 42% reduction in memory footprint and a 68% decrease in accuracy loss compared to state-of-the-art SVD-based methods, while maintaining over 98% of original task performance.
📝 Abstract
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP) tasks, yet their substantial memory requirements present significant challenges for deployment on resource-constrained devices. Singular Value Decomposition (SVD) has emerged as a promising compression technique for LLMs, offering considerable reductions in memory overhead. However, existing SVD-based methods often struggle to effectively mitigate the errors introduced by SVD truncation, leading to a noticeable performance gap when compared to the original models. Furthermore, applying a uniform compression ratio across all transformer layers fails to account for the varying importance of different layers. To address these challenges, we propose AdaSVD, an adaptive SVD-based LLM compression approach. Specifically, AdaSVD introduces adaComp, which adaptively compensates for SVD truncation errors by alternately updating the singular matrices U and V^T. Additionally, AdaSVD introduces adaCR, which adaptively assigns layer-specific compression ratios based on the relative importance of each layer. Extensive experiments across multiple LLM families and evaluation metrics demonstrate that AdaSVD consistently outperforms state-of-the-art (SOTA) SVD-based methods, achieving superior performance with significantly reduced memory requirements. The code and models will be available at https://github.com/ZHITENGLI/AdaSVD.