Continuous Subspace Optimization for Continual Learning

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Continual learning suffers from catastrophic forgetting, and existing low-rank adaptation methods are constrained by fixed low-rank subspaces, limiting their trade-off between parameter efficiency and representational capacity. To address this, we propose Continual Subspace Optimization (CSO): a framework that dynamically constructs an orthogonal sequence of task-specific subspaces via gradient-based singular value decomposition (SVD), and performs gradient updates via projection onto each subspace. CSO incorporates explicit subspace orthogonality constraints and an incremental evolution mechanism to ensure efficient, scalable, and adaptive parameter optimization. Crucially, CSO is the first method to generalize subspace optimization from static single-subspace paradigms to dynamic, evolving multi-subspace sequences. Empirically, it achieves state-of-the-art performance on long task sequences (≥50 tasks) and demonstrates superior stability and generalization across multiple benchmark datasets.

Technology Category

Application Category

📝 Abstract
Continual learning aims to learn multiple tasks sequentially while preserving prior knowledge, but faces the challenge of catastrophic forgetting when acquiring new knowledge. Recently, approaches leveraging pre-trained models have gained increasing popularity to mitigate this issue, due to the strong generalization ability of foundation models. To adjust pre-trained models for new tasks, existing methods usually employ low-rank adaptation, which restricts parameter updates to a fixed low-rank subspace. However, constraining the optimization space inherently compromises the model's learning capacity, resulting in inferior performance. To address the limitation, we propose Continuous Subspace Optimization for Continual Learning (CoSO) to fine-tune the model in a series of subspaces rather than a single one. These sequential subspaces are dynamically determined through the singular value decomposition of gradients. CoSO updates the model by projecting gradients into these subspaces, ensuring memory-efficient optimization. To mitigate forgetting, the optimization subspaces of each task are set to be orthogonal to the historical task subspace. During task learning, CoSO maintains a task-specific component that captures the critical update directions associated with the current task. Upon completing a task, this component is used to update the historical task subspace, laying the groundwork for subsequent learning. Extensive experiments on multiple datasets demonstrate that CoSO significantly outperforms state-of-the-art methods, especially in challenging scenarios with long task sequences.
Problem

Research questions and friction points this paper is trying to address.

Mitigate catastrophic forgetting in continual learning
Optimize model in dynamic subspaces for better performance
Ensure memory-efficient and orthogonal task learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic subspaces via gradient SVD optimization
Orthogonal subspaces to prevent catastrophic forgetting
Task-specific components for sequential subspace updates
🔎 Similar Papers
No similar papers found.
Q
Quan Cheng
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China
Yuanyu Wan
Yuanyu Wan
Zhejiang University
Machine LearningOnline LearningDistributed Optimization
L
Lingyu Wu
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China
Chenping Hou
Chenping Hou
National University of Defense Technology
Statistical data analysisData MiningMachine learning
L
Lijun Zhang
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China