Separation and Collaboration: Two-Level Routing Grouped Mixture-of-Experts for Multi-Domain Continual Learning

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-domain continual learning (MDCL) suffers from dual heterogeneity—both in class sets and data distributions—leading to catastrophic and forward forgetting. To address this, we propose Dual-layer Routing Mixture-of-Experts (DR-MoE): (1) dynamically expanding expert groups to decouple tasks; (2) intra-group routing to mitigate overfitting, and inter-group routing that jointly leverages task identifiers and prototype distances to enhance cross-task knowledge collaboration; (3) integrating semantic task descriptions generated by multimodal large models to improve task identification accuracy; and (4) combining CLIP pretraining with parameter-efficient fine-tuning for dynamic output fusion. Evaluated on multiple MDCL benchmarks, DR-MoE significantly outperforms state-of-the-art methods using substantially fewer trainable parameters, while simultaneously achieving superior knowledge retention and transferability.

Technology Category

Application Category

📝 Abstract
Multi-Domain Continual Learning (MDCL) acquires knowledge from sequential tasks with shifting class sets and distribution. Despite the Parameter-Efficient Fine-Tuning (PEFT) methods can adapt for this dual heterogeneity, they still suffer from catastrophic forgetting and forward forgetting. To address these challenges, we propose a Two-Level Routing Grouped Mixture-of-Experts (TRGE) method. Firstly, TRGE dynamically expands the pre-trained CLIP model, assigning specific expert group for each task to mitigate catastrophic forgetting. With the number of experts continually grows in this process, TRGE maintains the static experts count within the group and introduces the intra-group router to alleviate routing overfitting caused by the increasing routing complexity. Meanwhile, we design an inter-group routing policy based on task identifiers and task prototype distance, which dynamically selects relevant expert groups and combines their outputs to enhance inter-task collaboration. Secondly, to get the correct task identifiers, we leverage Multimodal Large Language Models (MLLMs) which own powerful multimodal comprehension capabilities to generate semantic task descriptions and recognize the correct task identifier. Finally, to mitigate forward forgetting, we dynamically fuse outputs for unseen samples from the frozen CLIP model and TRGE adapter based on training progress, leveraging both pre-trained and learned knowledge. Through extensive experiments across various settings, our method outperforms other advanced methods with fewer trainable parameters.
Problem

Research questions and friction points this paper is trying to address.

Mitigate catastrophic forgetting in multi-domain continual learning
Alleviate routing overfitting with intra-group routing policy
Reduce forward forgetting by fusing pre-trained and learned knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-Level Routing Grouped Mixture-of-Experts for MDCL
MLLMs generate semantic task descriptions
Dynamic fusion of CLIP and TRGE outputs
J
Jialu Zhou
College of Computer Science and Technology, National University of Defense Technology, Changsha, China
D
Dianxi Shi
Advanced Institute of Big Data, Beijing, China
S
Shaowu Yang
College of Computer Science and Technology, National University of Defense Technology, Changsha, China
Xinyu Wei
Xinyu Wei
PolyU & PKU
Computer VisionDeep Learning
M
Mingyue Yang
College of Computer Science and Technology, National University of Defense Technology, Changsha, China
L
Leqian Li
College of Computer Science and Technology, National University of Defense Technology, Changsha, China
Mengzhu Wang
Mengzhu Wang
National University of Defense Technology
transfer learningcomputer vision
C
Chunping Qiu
Intelligent Game and Decision Lab, Beijing, China