TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree

πŸ“… 2025-06-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the dual challenges of catastrophic forgetting and high computational overhead in continual learning of large pre-trained models (LPMs) on streaming data, this paper proposes a gradient-similarity-driven hierarchical LoRA adaptation framework. Our method dynamically activates task-aware LoRA modules per layer by constructing a gradient-similarity hierarchy based on K-d trees; efficiently explores task structure via a Lower Confidence Bound (LCB)-bandit algorithm; and optimizes parameter updates through sparse gradient propagation. Theoretical analysis guarantees convergence. Experiments on ViT and LLMs demonstrate that our approach achieves an average 5.7% improvement in task accuracy over state-of-the-art continual learning methods, reduces memory consumption by 42%, and accelerates training by 3.1Γ—β€”striking a significant balance between performance and efficiency.

Technology Category

Application Category

πŸ“ Abstract
Many real-world applications collect data in a streaming environment, where learning tasks are encountered sequentially. This necessitates continual learning (CL) to update models online, enabling adaptation to new tasks while preserving past knowledge to prevent catastrophic forgetting. Nowadays, with the flourish of large pre-trained models (LPMs), efficiency has become increasingly critical for CL, due to their substantial computational demands and growing parameter sizes. In this paper, we introduce TreeLoRA (K-D Tree of Low-Rank Adapters), a novel approach that constructs layer-wise adapters by leveraging hierarchical gradient similarity to enable efficient CL, particularly for LPMs. To reduce the computational burden of task similarity estimation, we employ bandit techniques to develop an algorithm based on lower confidence bounds to efficiently explore the task structure. Furthermore, we use sparse gradient updates to facilitate parameter optimization, making the approach better suited for LPMs. Theoretical analysis is provided to justify the rationale behind our approach, and experiments on both vision transformers (ViTs) and large language models (LLMs) demonstrate the effectiveness and efficiency of our approach across various domains, including vision and natural language processing tasks.
Problem

Research questions and friction points this paper is trying to address.

Efficient continual learning for large pre-trained models
Prevent catastrophic forgetting in sequential task learning
Reduce computational burden in task similarity estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise LoRAs guided by gradient-similarity tree
Bandit techniques for efficient task similarity estimation
Sparse gradient updates for parameter optimization
πŸ”Ž Similar Papers
No similar papers found.
Yu-Yang Qian
Yu-Yang Qian
Nanjing University
Scalable MLLarge-scale MLEfficiencyLLMOnline Learning
Y
Yuan-Ze Xu
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
Z
Zhenyu Zhang
RIKEN AIP, Tokyo, Japan
P
Peng Zhao
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
Zhi-Hua Zhou
Zhi-Hua Zhou
Nanjing University
Artificial IntelligenceMachine LearningData Mining