๐ค AI Summary
Graph incremental learning suffers from catastrophic forgetting: models readily lose previously acquired knowledge when adapting to newly arriving graph data. Existing approaches preserve historical model behavior holistically, overlooking the heterogeneous transfer value of temporal knowledgeโsome patterns facilitate positive transfer to new tasks, while others induce distributional shifts. To address this, we propose Dynamic Expert Mixture (DEMO), the first framework to introduce a time-aware dynamic expert network for graph incremental learning. DEMO employs a temporally weighted regularization loss to differentially constrain knowledge evolution across time steps and incorporates a Top-k sparse gating mechanism to enable efficient expert selection and computational compression. Under the class-incremental setting, DEMO achieves a 4.92% accuracy improvement over the strongest baseline, significantly mitigating catastrophic forgetting while enhancing generalization performance.
๐ Abstract
Graph incremental learning is a learning paradigm that aims to adapt trained models to continuously incremented graphs and data over time without the need for retraining on the full dataset. However, regular graph machine learning methods suffer from catastrophic forgetting when applied to incremental learning settings, where previously learned knowledge is overridden by new knowledge. Previous approaches have tried to address this by treating the previously trained model as an inseparable unit and using techniques to maintain old behaviors while learning new knowledge. These approaches, however, do not account for the fact that previously acquired knowledge at different timestamps contributes differently to learning new tasks. Some prior patterns can be transferred to help learn new data, while others may deviate from the new data distribution and be detrimental. To address this, we propose a dynamic mixture-of-experts (DyMoE) approach for incremental learning. Specifically, a DyMoE GNN layer adds new expert networks specialized in modeling the incoming data blocks. We design a customized regularization loss that utilizes data sequence information so existing experts can maintain their ability to solve old tasks while helping the new expert learn the new data effectively. As the number of data blocks grows over time, the computational cost of the full mixture-of-experts (MoE) model increases. To address this, we introduce a sparse MoE approach, where only the top-$k$ most relevant experts make predictions, significantly reducing the computation time. Our model achieved 4.92% relative accuracy increase compared to the best baselines on class incremental learning, showing the model's exceptional power.