GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
LoRA suffers from structural bottlenecks at high ranks, causing gradient entanglement across input channels, which leads to overfitting and performance saturation—hindering its ability to approximate full fine-tuning (FFT). To address this, we propose Granular Low-Rank Adaptation (GraLoRA), the first sub-block-level low-rank adaptation framework: it partitions the weight matrix into fine-grained blocks and assigns each block an independent low-rank adapter, explicitly decoupling gradient propagation paths. This design incurs virtually zero additional parameters or computational overhead while substantially enhancing representational capacity and FFT approximation fidelity. On HumanEval+, GraLoRA achieves a +8.5% improvement in Pass@1, consistently outperforming LoRA and other PEFT baselines across diverse model scales and rank configurations. Its performance demonstrates strong robustness and scalability.

Technology Category

Application Category

📝 Abstract
Low-Rank Adaptation (LoRA) is a popular method for parameter-efficient fine-tuning (PEFT) of generative models, valued for its simplicity and effectiveness. Despite recent enhancements, LoRA still suffers from a fundamental limitation: overfitting when the bottleneck is widened. It performs best at ranks 32-64, yet its accuracy stagnates or declines at higher ranks, still falling short of full fine-tuning (FFT) performance. We identify the root cause as LoRA's structural bottleneck, which introduces gradient entanglement to the unrelated input channels and distorts gradient propagation. To address this, we introduce a novel structure, Granular Low-Rank Adaptation (GraLoRA) that partitions weight matrices into sub-blocks, each with its own low-rank adapter. With negligible computational or storage cost, GraLoRA overcomes LoRA's limitations, effectively increases the representational capacity, and more closely approximates FFT behavior. Experiments on code generation and commonsense reasoning benchmarks show that GraLoRA consistently outperforms LoRA and other baselines, achieving up to +8.5% absolute gain in Pass@1 on HumanEval+. These improvements hold across model sizes and rank settings, making GraLoRA a scalable and robust solution for PEFT. Code, data, and scripts are available at https://github.com/SqueezeBits/GraLoRA.git
Problem

Research questions and friction points this paper is trying to address.

LoRA overfits when bottleneck is widened
LoRA's structural bottleneck causes gradient entanglement
GraLoRA partitions matrices to improve representational capacity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Partitions weight matrices into sub-blocks
Each sub-block has its own low-rank adapter
Increases representational capacity without extra cost
🔎 Similar Papers
No similar papers found.
Y
Yeonjoon Jung
SqueezeBits, POSTECH
Daehyun Ahn
Daehyun Ahn
Squeezebits Inc.
H
Hyungjun Kim
SqueezeBits
T
Taesu Kim
SqueezeBits
Eunhyeok Park
Eunhyeok Park
POSTECH
neural network optimizationenergy efficient hardware design