Less is More: Resource-Efficient Low-Rank Adaptation

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address high parameter redundancy, substantial computational overhead, and cross-layer parameter interference in LoRA-based fine-tuning of large language models, this paper proposes EffiLoRA. Methodologically, EffiLoRA is the first to jointly tackle inter-layer and intra-layer redundancy: it introduces cross-layer-shared low-rank update matrices (A) and incorporates a runtime dynamic sparsity selection mechanism to sparsely update the (B) matrices on-demand. By unifying low-rank decomposition, parameter sharing, and dynamic sparse updates, EffiLoRA significantly reduces trainable parameters (up to 72% reduction) and FLOPs. Extensive experiments demonstrate that EffiLoRA consistently outperforms standard LoRA across diverse tasks—including commonsense reasoning, vision-language instruction tuning, and image generation—while preserving or improving model performance, enhancing resource efficiency, and boosting robustness. Moreover, EffiLoRA is readily extensible to multimodal architectures and diffusion models.

Technology Category

Application Category

📝 Abstract

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), but it still incurs notable overhead and suffers from parameter interference in complex datasets. While re- cent works decouple LoRA update matrices to exploit matrix-wise asymmetry, training costs remain high. We revisit LoRA from the perspective of inter-matrix and intra-layer parameter redundancy and propose Resource-Efficient Low-Rank Adaptation, EffiLoRA, a lightweight and generalizable approach for language, multimodal, and diffusion models. EffiLoRA employs a unified A matrix across all transformer layers and introduces a runtime selective B matrices up- date to dynamically trade-off the system resource budget and model performance. EffiLoRA consistently outperforms LoRA across diverse modalities, including commonsense reasoning, visual instruction tuning, and image generation, demon- strating improved efficiency and robustness.

Problem

Research questions and friction points this paper is trying to address.

Reduces parameter redundancy in LoRA for efficient fine-tuning

Dynamically balances resource usage and model performance

Enhances efficiency across language, multimodal, and diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified A matrix across all transformer layers

Runtime selective B matrices update for dynamic trade-off

Reduces parameter redundancy and improves efficiency

🔎 Similar Papers

No similar papers found.