🤖 AI Summary
To address high parameter redundancy, substantial computational overhead, and cross-layer parameter interference in LoRA-based fine-tuning of large language models, this paper proposes EffiLoRA. Methodologically, EffiLoRA is the first to jointly tackle inter-layer and intra-layer redundancy: it introduces cross-layer-shared low-rank update matrices (A) and incorporates a runtime dynamic sparsity selection mechanism to sparsely update the (B) matrices on-demand. By unifying low-rank decomposition, parameter sharing, and dynamic sparse updates, EffiLoRA significantly reduces trainable parameters (up to 72% reduction) and FLOPs. Extensive experiments demonstrate that EffiLoRA consistently outperforms standard LoRA across diverse tasks—including commonsense reasoning, vision-language instruction tuning, and image generation—while preserving or improving model performance, enhancing resource efficiency, and boosting robustness. Moreover, EffiLoRA is readily extensible to multimodal architectures and diffusion models.
📝 Abstract
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), but it still incurs notable overhead and suffers from parameter interference in complex datasets. While re- cent works decouple LoRA update matrices to exploit matrix-wise asymmetry, training costs remain high. We revisit LoRA from the perspective of inter-matrix and intra-layer parameter redundancy and propose Resource-Efficient Low-Rank Adaptation, EffiLoRA, a lightweight and generalizable approach for language, multimodal, and diffusion models. EffiLoRA employs a unified A matrix across all transformer layers and introduces a runtime selective B matrices up- date to dynamically trade-off the system resource budget and model performance. EffiLoRA consistently outperforms LoRA across diverse modalities, including commonsense reasoning, visual instruction tuning, and image generation, demon- strating improved efficiency and robustness.