π€ AI Summary
This work addresses catastrophic forgetting in vision-language models during continual learning, which arises from interference between task-specific subspaces. To mitigate this issue, the authors propose a hierarchical dual-subspace decoupling framework that, for the first time, disentangles parameter updates from a subspace-structure perspective. The approach employs a lightweight feature modulation module to decompose the parameter space into shared and task-specific subspaces, integrates an adaptive thresholding mechanism to capture stable knowledge, leverages singular value decomposition (SVD) for multi-scale subspace constraints, and incorporates a parameter scaling strategy to reduce cross-task interference and parameter drift. Evaluated on multiple standard continual learning benchmarks, the method significantly alleviates forgetting while enhancing new task acquisition, achieving state-of-the-art performance.
π Abstract
Class-incremental learning aims to continuously acquire new knowledge while preserving previously learned information, thereby mitigating catastrophic forgetting. Existing methods primarily restrict parameter updates but often overlook their structural properties in high-dimensional spaces. From a subspace perspective, updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference and severe forgetting. To address this issue, we propose HDSD, a Hierarchical Dual-Subspace Decoupling framework for continual learning in vision-language models. Specifically, we introduce a lightweight Feature Modulation Module (FMM) that explicitly decomposes the parameter space into general and task-specific subspaces. Building on this design, we develop two complementary components. First, a General Fusion Module (GFM) evaluates relative parameter changes across tasks and uses an adaptive threshold to capture stable and transferable knowledge. Second, a Hierarchical Learning Module (HLM) performs structured parameter decomposition via Singular Value Decomposition (SVD) and uses a scaling mechanism to constrain updates within distinct subspace scales. Together, these designs reduce subspace interference and parameter drift. Extensive experiments on conventional benchmarks show that HDSD achieves state-of-the-art results.