Less is More: Resource-Efficient Low-Rank Adaptation

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high parameter redundancy, substantial computational overhead, and cross-layer parameter interference in LoRA-based fine-tuning of large language models, this paper proposes EffiLoRA. Methodologically, EffiLoRA is the first to jointly tackle inter-layer and intra-layer redundancy: it introduces cross-layer-shared low-rank update matrices (A) and incorporates a runtime dynamic sparsity selection mechanism to sparsely update the (B) matrices on-demand. By unifying low-rank decomposition, parameter sharing, and dynamic sparse updates, EffiLoRA significantly reduces trainable parameters (up to 72% reduction) and FLOPs. Extensive experiments demonstrate that EffiLoRA consistently outperforms standard LoRA across diverse tasks—including commonsense reasoning, vision-language instruction tuning, and image generation—while preserving or improving model performance, enhancing resource efficiency, and boosting robustness. Moreover, EffiLoRA is readily extensible to multimodal architectures and diffusion models.

Technology Category

Application Category

📝 Abstract
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), but it still incurs notable overhead and suffers from parameter interference in complex datasets. While re- cent works decouple LoRA update matrices to exploit matrix-wise asymmetry, training costs remain high. We revisit LoRA from the perspective of inter-matrix and intra-layer parameter redundancy and propose Resource-Efficient Low-Rank Adaptation, EffiLoRA, a lightweight and generalizable approach for language, multimodal, and diffusion models. EffiLoRA employs a unified A matrix across all transformer layers and introduces a runtime selective B matrices up- date to dynamically trade-off the system resource budget and model performance. EffiLoRA consistently outperforms LoRA across diverse modalities, including commonsense reasoning, visual instruction tuning, and image generation, demon- strating improved efficiency and robustness.
Problem

Research questions and friction points this paper is trying to address.

Reduces parameter redundancy in LoRA for efficient fine-tuning
Dynamically balances resource usage and model performance
Enhances efficiency across language, multimodal, and diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified A matrix across all transformer layers
Runtime selective B matrices update for dynamic trade-off
Reduces parameter redundancy and improves efficiency
🔎 Similar Papers
No similar papers found.
Chunlin Tian
Chunlin Tian
University of Macau
MLSys
X
Xuyang Wei
University of Macau
H
Huanrong Liu
University of Macau
Zhijiang Guo
Zhijiang Guo
HKUST (GZ) | HKUST
Natural Language ProcessingMachine LearningLarge Language Models
L
Li Li
University of Macau