🤖 AI Summary
To address the degradation in performance caused by ignoring knowledge relevance and introducing noisy knowledge in parameter-efficient fine-tuning (PEFT) of large language models (LLMs), this paper proposes Knowledge-aware Singular Value Decomposition (K-SVD). K-SVD is the first method to embed task-aware knowledge importance modeling directly into the SVD decomposition process, enabling fine-grained, interpretable knowledge gating via learnable singular value scaling. It dynamically activates task-relevant knowledge within low-rank updates. The approach is architecture-agnostic and integrates knowledge distillation with a task-relevance scoring mechanism. Extensive experiments across 16 standard benchmarks and 4 synthetic datasets demonstrate that K-SVD consistently outperforms full-parameter fine-tuning and 14 state-of-the-art PEFT methods, achieving an average accuracy gain of 2.1% on natural language understanding, natural language generation, instruction following, and commonsense reasoning tasks.
📝 Abstract
The increasing sizes of large language models (LLMs) result in significant computational overhead and memory usage when adapting these models to specific tasks or domains. Various parameter-efficient fine-tuning (PEFT) methods have been devised to mitigate these challenges by training a small set of parameters for the task-specific updates of the model weights. Among PEFT methods, LoRA stands out for its simplicity and efficiency, inspiring the development of a series of variants. However, LoRA and its successors disregard the knowledge that is noisy or irrelevant to the targeted task, detrimentally impacting model performance and leading to suboptimality. To address this limitation, we introduce Knowledge-aware Singular-value Adaptation (KaSA), a PEFT method that leverages singular value decomposition (SVD) with knowledge-aware singular values to dynamically activate knowledge based on its relevance to the task at hand. We conduct extensive experiments across a range of LLMs on tasks spanning natural language understanding (NLU), generation (NLG), instruction following, and commonsense reasoning. The experimental results demonstrate that KaSA consistently outperforms FFT and 14 popular PEFT baselines across 16 benchmarks and 4 synthetic datasets, underscoring our method's efficacy and adaptability. The source code of our method is available at https://github.com/juyongjiang/KaSA.