Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
LoRA adapters often converge to suboptimal solutions near initialization, resulting in poor generalization and low robustness to merging and pruning. To address this, we propose CoTo, a progressive training strategy that introduces stochastic adapter deactivation—dynamically increasing activation probability during training to balance loss landscape exploration and optimization. We theoretically establish that CoTo enhances inter-layer dropout stability and linear mode connectivity. Furthermore, we devise the first Shapley-value-based framework grounded in cooperative game theory to quantify the marginal contribution of each adapter. CoTo is lightweight and fully compatible with mainstream variants such as QLoRA and AdaLoRA. Experiments demonstrate that CoTo improves single-task fine-tuning accuracy by 1.8% on average, boosts multi-task adapter merging accuracy by 4.2%, reduces post-pruning performance degradation by 37%, and cuts training overhead by 15%.

Technology Category

Application Category

📝 Abstract
Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters' activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter's marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.
Problem

Research questions and friction points this paper is trying to address.

Improves Low-rank adaptation (LoRA) to avoid suboptimal minima
Enhances model generalization and downstream operations
Proposes progressive training strategy for balanced optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive training strategy for LoRA
Stochastic adapter deactivation for optimization
Cooperative-game approach for adapter contribution
🔎 Similar Papers
No similar papers found.
Z
Zhuang Zhan
Southern University of Science and Technology, Shenzhen, China; City University of Hong Kong, Hong Kong SAR, China
X
Xiequn Wang
Southern University of Science and Technology, Shenzhen, China
W
Wei Li
Southern University of Science and Technology, Shenzhen, China
Yulong Zhang
Yulong Zhang
Google
Security and Privacy
Qiushi Huang
Qiushi Huang
University of Surrey
Natural Language ProcessingNatural Language UnderstandingNatural Language Generation
Shuhao Chen
Shuhao Chen
HKUST, SUSTech
Transfer LearningLarge Language Model
Xuehao Wang
Xuehao Wang
Zhejiang University
Multi-Task LearningSegment Anything ModelPEFTLLM
Y
Yanbin Wei
Southern University of Science and Technology, Shenzhen, China; Hong Kong University of Science and Technology, Hong Kong SAR, China
Y
Yuhe Nie
New York University, New York, USA
Kede Ma
Kede Ma
Associate Professor of Computer Science, City University of Hong Kong
Image ProcessingComputational VisionComputational PhotographyMultimedia Forensics
Y
Yu Zhang
Southern University of Science and Technology, Shenzhen, China
Ying Wei
Ying Wei
Zhejiang University
Machine LearningTransfer LearningContinual LearningAI for Science