🤖 AI Summary
In sample-free continual learning, state-space models (SSMs) suffer from catastrophic forgetting due to uncontrolled internal state evolution. To address this, we introduce, for the first time, infinite-dimensional Grassmann manifold geometry into SSM-based continual learning, proposing a weight-agnostic geometric regularization method that stabilizes state dynamics by constraining the infinite-horizon evolution trajectory of the extended observability subspace. Leveraging the SSM’s structural properties, we design an *O*(*n*²) efficient solver based on the Sylvester equation. The method is modular and seamlessly integrates with mainstream continual learning approaches. Extensive experiments on benchmarks including ImageNet-R and Caltech-256 demonstrate significant reductions in forgetting rates and substantial improvements in average multi-task accuracy, validating its effectiveness, computational efficiency, and generalizability.
📝 Abstract
State-Space Models (SSMs) excel at capturing long-range dependencies with structured recurrence, making them well-suited for sequence modeling. However, their evolving internal states pose challenges in adapting them under Continual Learning (CL). This is particularly difficult in exemplar-free settings, where the absence of prior data leaves updates to the dynamic SSM states unconstrained, resulting in catastrophic forgetting. To address this, we propose Inf-SSM, a novel and simple geometry-aware regularization method that utilizes the geometry of the infinite-dimensional Grassmannian to constrain state evolution during CL. Unlike classical continual learning methods that constrain weight updates, Inf-SSM regularizes the infinite-horizon evolution of SSMs encoded in their extended observability subspace. We show that enforcing this regularization requires solving a matrix equation known as the Sylvester equation, which typically incurs $mathcal{O}(n^3)$ complexity. We develop a $mathcal{O}(n^2)$ solution by exploiting the structure and properties of SSMs. This leads to an efficient regularization mechanism that can be seamlessly integrated into existing CL methods. Comprehensive experiments on challenging benchmarks, including ImageNet-R and Caltech-256, demonstrate a significant reduction in forgetting while improving accuracy across sequential tasks.