Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing LoRA-based parameter-efficient fine-tuning (PEFT) underperforms full fine-tuning (Full FT) on large language models (LLMs), primarily due to static SVD initialization impeding pretrained knowledge transfer and the inherent weight misalignment and complex gradient dynamics in Mixture-of-Experts (MoE) architectures, which hinder effective SVD application. Method: We propose an adaptive singular-value-guided MoE alignment framework. Our approach introduces a novel structured SVD-MoE architecture and incorporates theoretically grounded scaling factors to jointly optimize alignment in both gradient and parameter spaces—without altering model architecture or training algorithms. Contribution/Results: Evaluated across 25 cross-modal tasks, our method achieves Full FT-level performance, substantially closing the gap with full fine-tuning and establishing new state-of-the-art (SOTA) results in PEFT for MoE-based LLMs.

Technology Category

Application Category

📝 Abstract

While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with static singular value decomposition (SVD) subsets, leading to suboptimal leveraging of pre-trained knowledge. Another path for improving LoRA is incorporating a Mixture-of-Experts (MoE) architecture. However, weight misalignment and complex gradient dynamics make it challenging to adopt SVD prior to the LoRA MoE architecture. To mitigate these issues, we propose underline{G}reat Lunderline{o}Runderline{A} Mixture-of-Experunderline{t} (GOAT), a framework that (1) adaptively integrates relevant priors using an SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor. We demonstrate that proper scaling, without modifying the architecture or training algorithms, boosts LoRA MoE's efficiency and performance. Experiments across 25 datasets, including natural language understanding, commonsense reasoning, image classification, and natural language generation, demonstrate GOAT's state-of-the-art performance, closing the gap with Full FT.

Problem

Research questions and friction points this paper is trying to address.

Enhance LoRA performance via adaptive SVD and MoE.

Address weight misalignment in LoRA with MoE architecture.

Boost LoRA efficiency without altering training algorithms.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive SVD MoE integration

Optimization alignment scaling

Boosts LoRA efficiency performance

🔎 Similar Papers

No similar papers found.