🤖 AI Summary
Full-parameter fine-tuning of large language models (LLMs) incurs prohibitive computational costs, while existing parameter-efficient fine-tuning (PEFT) methods—such as LoRA—still suffer from gradient vanishing and parameter compression bottlenecks. To address these limitations, this paper proposes LoR2C, a low-rank residual connection fine-tuning method embedded within Transformer layers. Its core innovation is the first introduction of a low-rank residual adaptation mechanism, coupled with three structural variants—Share, Merge, and Inject—that enable cross-layer parameter sharing, module fusion, and lightweight injection, respectively. Extensive experiments across diverse NLU and NLG benchmarks demonstrate that LoR2C reduces trainable parameters by 30%–60% relative to LoRA while maintaining or improving task performance. Moreover, LoR2C consistently outperforms state-of-the-art PEFT approaches in both efficiency and effectiveness.
📝 Abstract
In recent years, pretrained large language models have demonstrated outstanding performance across various natural language processing tasks. However, full-parameter fine-tuning methods require adjusting all model parameters, leading to immense computational resource demands. Although parameter-efficient fine-tuning methods like LoRA have significantly reduced the number of parameters, they still face challenges such as gradient vanishing and the potential for further parameter reduction. To address these issues, this paper proposes a novel parameter-efficient fine-tuning method called LoR2C (Low-Rank Residual Connection Adaptation). LoR2C introduces residual connections with low-rank matrices within the model layers, which not only reduces the number of fine-tuning parameters but also effectively alleviates the gradient vanishing problem. Additionally, this paper presents three optimization variants of LoR2C: ShareLoR2C, MergeLoR2C, and InjectLoR2C. These variants further improve parameter efficiency and model performance through parameter sharing, module merging, and injection mechanisms, respectively. Experimental results on multiple natural language understanding and natural language generation tasks demonstrate that LoR2C and its optimized variants significantly reduce parameter overhead while maintaining or even improving performance, outperforming existing mainstream parameter-efficient fine-tuning methods.Our code is publicly available at https://github.com/Oblivioniss/LoR2C.