🤖 AI Summary
This work addresses the routing drift problem in class-incremental learning caused by the expansion of expert modules in Mixture-of-Experts (MoE) architectures. To balance routing stability for old classes with adaptability to new classes, the authors propose the StaR-MoE framework. Its key innovations include the first introduction of a sensitivity-aware routing distribution alignment mechanism and an asymmetric capacity regularization, combined with a strategy that freezes previously trained experts. This design effectively mitigates knowledge interference arising from the misrouting of old-class samples to newly added experts. Extensive experiments demonstrate that StaR-MoE consistently outperforms existing methods across four standard class-incremental learning benchmarks, achieving significant improvements in both average accuracy and final-stage accuracy.
📝 Abstract
Class-incremental learning (CIL) requires models to learn new classes sequentially while preserving prior knowledge. Recently, approaches that combine pre-trained models with mixture-of-experts (MoE) have received increasing attention in CIL: they typically expand experts during learning and employ a router to assign weights across experts. However, existing MoE methods often overlook routing drift induced by expert expansion. Once new experts are introduced, the router may reassign samples from earlier classes to newly added experts, thereby perturbing previously established expert compositions and causing interference even when old experts remain frozen. We argue that expandable MoE in CIL requires two complementary properties: stable old-class routing for knowledge preservation and sufficient capacity utilization for new-class adaptation. To this end, we propose Stable Routing for MoE (StaR-MoE), a routing-level framework for expandable MoE in CIL. By incorporating sensitivity-aware routing alignment, StaR-MoE aligns current old-class routing behavior with historical routing distributions through sensitivity-guided constraints. Complementarily, StaR-MoE introduces asymmetric capacity regularization to encourage effective utilization of the expanded expert pool without compromising class-specific routing specialization. Extensive experiments across four standard CIL benchmarks demonstrate that StaR-MoE consistently improves both average and last accuracy over state-of-the-art methods, highlighting the importance of stable routing.