🤖 AI Summary
Balancing stability and plasticity remains a fundamental challenge in continual learning. To address this, we propose a LoRA-based gradient subspace splitting framework that theoretically derives optimal orthogonal gradient subspaces via subspace analysis, enabling strict separation of gradients from new and old tasks at the parameter update level. Unlike heuristic projection methods, our approach is the first to rigorously integrate LoRA with gradient subspace theory, yielding a dynamically adaptive splitting mechanism with provable guarantees on the stability–plasticity trade-off. Extensive experiments on multiple standard continual learning benchmarks demonstrate state-of-the-art performance: our method significantly mitigates catastrophic forgetting while simultaneously enhancing adaptation to new tasks. Theoretical analysis and empirical validation jointly confirm that the learned orthogonal subspaces effectively decouple task-specific gradient updates, preserving prior knowledge without compromising learnability of novel tasks.
📝 Abstract
Continual Learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks. Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two orthogonal subspaces: a primary subspace and a minor subspace. New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge. However, existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space. In this work, we consider a continual learning paradigm based on Low-Rank Adaptation, which has gained considerable attention due to its efficiency and wide applicability, and propose a novel approach for continual learning, called SplitLoRA. We first provide a theoretical analysis of how subspace partitioning affects model stability and plasticity. Informed by this analysis, we then introduce an effective method that derives the optimal partition of the gradient space for previously learned tasks. This approach effectively balances stability and plasticity in continual learning. Experimental results on multiple datasets demonstrate that the proposed method achieves state-of-the-art performance.