Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

224K/year
πŸ€– AI Summary
This work addresses the trade-off between computational cost and performance in post-training large language models, where full fine-tuning is prohibitively expensive and parameter-efficient methods like LoRA underperform on complex reasoning tasks. The authors propose a hybrid post-training framework that applies full fine-tuning to critical modules while using LoRA elsewhere. To guide module selection under a fixed parameter budget, they introduce the Hybrid-LoRA Score, which quantifies each module’s sensitivity to low-rank adaptation. The approach seamlessly integrates into existing post-training paradigms such as RLVR (e.g., GRPO/GSPO). Experiments demonstrate that fine-tuning only 10% of modules with full parameters nearly matches the performance of full fine-tuning and outperforms state-of-the-art parameter-efficient fine-tuning methods by 4.36% on average, with gains up to 5.65%.
πŸ“ Abstract
Post-training has become essential for adapting large language models (LLMs) to complex downstream behaviors, including instruction following, preference alignment, and multi-step reasoning. Reinforcement learning with verifiable rewards (RLVR) has recently emerged as a particularly effective post-training paradigm for improving reasoning capabilities, with critic-free algorithms such as GRPO and GSPO enabling scalable optimization. However, RLVR post-training with full fine-tuning (FFT) requires substantial GPU memory and incurs high training costs. Although parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), effectively reduce computational costs, they often suffer from a noticeable performance gap compared to full fine-tuning in post-training for complex reasoning tasks. In this paper, we propose Hybrid-LoRA, an efficient hybrid post-training framework that selectively applies full fine-tuning to a small subset of modules less suited to low-rank adaptation, while adapting the remaining components with LoRA. We introduce a novel Hybrid-LoRA Score to rank candidate modules according to their sensitivity to low-rank adaptation under a fixed parameter budget. Experiments show that Hybrid-LoRA closely matches full fine-tuning performance under a 10% full fine-tuning module budget, with the remaining candidate modules adapted by LoRA, consistently outperforming four state-of-the-art PEFT post-training baselines, achieving improvements of up to 5.65% and on average 4.36% over the best baseline.
Problem

Research questions and friction points this paper is trying to address.

post-training
full fine-tuning
Low-Rank Adaptation
parameter-efficient fine-tuning
reasoning tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid-LoRA
Low-Rank Adaptation
Full Fine-Tuning
Post-Training
Parameter-Efficient Fine-Tuning