PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of catastrophic forgetting and expert role ambiguity in multimodal large language models during continual instruction tuning, which arises from co-drift between routers and experts. To mitigate this, the authors propose Path Activation Subspace (PAS) as a coordinate system for capability alignment, decoupling router and expert updates. Specifically, low-rank path activation signals from experts guide router reweighting while selectively stabilizing rank directions critical to historical tasks. Building upon this insight, they introduce a fixed-capacity MoE-LoRA architecture that requires no additional parameters and effectively suppresses co-drift. Experiments demonstrate that the proposed method significantly outperforms existing continual learning and MoE-LoRA approaches on continual instruction tuning benchmarks, achieving consistent improvements in both accuracy and resistance to forgetting.

Technology Category

Application Category

📝 Abstract
Continual instruction tuning (CIT) requires multimodal large language models (MLLMs) to adapt to a stream of tasks without forgetting prior capabilities. A common strategy is to isolate updates by routing inputs to different LoRA experts. However, existing LoRA-based Mixture-of-Experts (MoE) methods often jointly update the router and experts in an indiscriminate way, causing the router's preferences to co-drift with experts'adaptation pathways and gradually deviate from early-stage input-expert specialization. We term this phenomenon Misaligned Co-drift, which blurs expert responsibilities and exacerbates forgetting.To address this, we introduce the pathway activation subspace (PASs), a LoRA-induced subspace that reflects which low-rank pathway directions an input activates in each expert, providing a capability-aligned coordinate system for routing and preservation. Based on PASs, we propose a fixed-capacity PASs-based MoE-LoRA method with two components: PAS-guided Reweighting, which calibrates routing using each expert's pathway activation signals, and PAS-aware Rank Stabilization, which selectively stabilizes rank directions important to previous tasks. Experiments on a CIT benchmark show that our approach consistently outperforms a range of conventional continual learning baselines and MoE-LoRA variants in both accuracy and anti-forgetting without adding parameters. Our code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Continual Instruction Tuning
Mixture-of-Experts
Misaligned Co-drift
LoRA
Catastrophic Forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pathway Activation Subspaces
Misaligned Co-drift
Mixture-of-Experts
Continual Learning
LoRA
🔎 Similar Papers
No similar papers found.
Z
Zhiyan Hou
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Haiyun Guo
Haiyun Guo
Rice University ECE Ph.D.
optical imagingcomputational photographyMetalens
Haokai Ma
Haokai Ma
Postdoctoral Research Fellow, National University of Singapore
Cross-domain RecommendationLLM for Cybersecurity
Y
Yandu Sun
Southeast University, Nanjing, China
Yonghui Yang
Yonghui Yang
National University of Singapore
Data-centric AILLM Safety
J
Jinqiao Wang
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Wuhan AI Research, Wuhan, China