PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning

📅 2026-01-19

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the challenge of catastrophic forgetting and expert role ambiguity in multimodal large language models during continual instruction tuning, which arises from co-drift between routers and experts. To mitigate this, the authors propose Path Activation Subspace (PAS) as a coordinate system for capability alignment, decoupling router and expert updates. Specifically, low-rank path activation signals from experts guide router reweighting while selectively stabilizing rank directions critical to historical tasks. Building upon this insight, they introduce a fixed-capacity MoE-LoRA architecture that requires no additional parameters and effectively suppresses co-drift. Experiments demonstrate that the proposed method significantly outperforms existing continual learning and MoE-LoRA approaches on continual instruction tuning benchmarks, achieving consistent improvements in both accuracy and resistance to forgetting.

Technology Category

Application Category

📝 Abstract

Continual instruction tuning (CIT) requires multimodal large language models (MLLMs) to adapt to a stream of tasks without forgetting prior capabilities. A common strategy is to isolate updates by routing inputs to different LoRA experts. However, existing LoRA-based Mixture-of-Experts (MoE) methods often jointly update the router and experts in an indiscriminate way, causing the router's preferences to co-drift with experts'adaptation pathways and gradually deviate from early-stage input-expert specialization. We term this phenomenon Misaligned Co-drift, which blurs expert responsibilities and exacerbates forgetting.To address this, we introduce the pathway activation subspace (PASs), a LoRA-induced subspace that reflects which low-rank pathway directions an input activates in each expert, providing a capability-aligned coordinate system for routing and preservation. Based on PASs, we propose a fixed-capacity PASs-based MoE-LoRA method with two components: PAS-guided Reweighting, which calibrates routing using each expert's pathway activation signals, and PAS-aware Rank Stabilization, which selectively stabilizes rank directions important to previous tasks. Experiments on a CIT benchmark show that our approach consistently outperforms a range of conventional continual learning baselines and MoE-LoRA variants in both accuracy and anti-forgetting without adding parameters. Our code will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Continual Instruction Tuning

Mixture-of-Experts

Misaligned Co-drift

LoRA

Catastrophic Forgetting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pathway Activation Subspaces

Misaligned Co-drift

Mixture-of-Experts