🤖 AI Summary
To address catastrophic forgetting and parameter explosion in large-scale generative models during continual learning of novel visual concepts, this paper proposes an efficient-parameter lifelong visual concept learning framework. Methodologically, we design a Mixture-of-Experts (MoE) architecture guided by routing distillation, prune redundant experts to compress parameters, and introduce a hierarchical local attention mechanism to guide inference—thereby mitigating concept interference and forgetting. Our key contribution lies in synergistically integrating routing distillation with structured sparsity for dynamic MoE evolution, jointly optimizing parameter efficiency and knowledge stability. On the CustomConcept-101 benchmark, our approach reduces forgetting rate by 87.8% and parameter count by 63.3% over state-of-the-art methods, while significantly improving generation fidelity and consistency for both novel and previously learned concepts.
📝 Abstract
Enabling large-scale generative models to continuously learn new visual concepts is essential for personalizing pre-trained models to meet individual user preferences. Existing approaches for continual visual concept learning are constrained by two fundamental challenges: catastrophic forgetting and parameter expansion. In this paper, we propose Redundancy-Removal Mixture of Experts (R^2MoE), a parameter-efficient framework for lifelong visual concept learning that effectively learns new concepts while incurring minimal parameter overhead. Our framework includes three key innovative contributions: First, we propose a mixture-of-experts framework with a routing distillation mechanism that enables experts to acquire concept-specific knowledge while preserving the gating network's routing capability, thereby effectively mitigating catastrophic forgetting. Second, we propose a strategy for eliminating redundant layer-wise experts that reduces the number of expert parameters by fully utilizing previously learned experts. Third, we employ a hierarchical local attention-guided inference approach to mitigate interference between generated visual concepts. Extensive experiments have demonstrated that our method generates images with superior conceptual fidelity compared to the state-of-the-art (SOTA) method, achieving an impressive 87.8% reduction in forgetting rates and 63.3% fewer parameters on the CustomConcept 101 dataset. Our code is available at {https://github.com/learninginvision/R2MoE}