CE-LoRA: Computation-Efficient LoRA Fine-Tuning for Language Models

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the high computational overhead and low efficiency of activation gradient computation during backpropagation in LoRA fine-tuning, this work identifies this step as a critical compute bottleneck—first such characterization in the literature. We propose a lightweight fine-tuning framework balancing computational and memory efficiency: (1) a sparsified approximate matrix multiplication to reduce gradient computation complexity; (2) a Double-LoRA dual-path gradient estimation mechanism that jointly suppresses approximation error; and (3) a low-rank parameter update scheme. We theoretically establish an $O(1/sqrt{T})$ convergence rate. Experiments demonstrate that, compared to standard LoRA, our method significantly reduces FLOPs and training time—achieving up to 47% lower computational cost and 2.1× faster training—while preserving near-identical model performance across multiple benchmark tasks.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) demonstrate exceptional performance across various tasks but demand substantial computational resources even for fine-tuning computation. Although Low-Rank Adaptation (LoRA) significantly alleviates memory consumption during fine-tuning, its impact on computational cost reduction is limited. This paper identifies the computation of activation gradients as the primary bottleneck in LoRA's backward propagation and introduces the Computation-Efficient LoRA (CE-LoRA) algorithm, which enhances computational efficiency while preserving memory efficiency. CE-LoRA leverages two key techniques: Approximated Matrix Multiplication, which replaces dense multiplications of large and complete matrices with sparse multiplications involving only critical rows and columns, and the Double-LoRA technique, which reduces error propagation in activation gradients. Theoretically, CE-LoRA converges at the same rate as LoRA, $ mathcal{O}(1/sqrt{T}) $, where $T$ is the number of iteartions. Empirical evaluations confirm that CE-LoRA significantly reduces computational costs compared to LoRA without notable performance degradation.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Fine-tuning

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

CE-LoRA

Efficient Fine-tuning

Matrix Multiplication Approximation

🔎 Similar Papers

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation