PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

228K/year
🤖 AI Summary
This work addresses a critical flaw in existing federated multimodal continual learning approaches, which erroneously assume that Mixture-of-Experts (MoE) routing can fully isolate task-specific knowledge, thereby leading to intra-expert gradient conflicts and catastrophic forgetting. To remedy this, the authors propose PRISM, the first method to explicitly expose this assumption’s inadequacy and introduce interference-aware gradient subspace bases. PRISM reframes MoE routing as a capacity allocator and enforces explicit orthogonality among task subspaces during both parameter-efficient fine-tuning—implemented via MoE-LoRA—and federated averaging. Extensive experiments demonstrate that PRISM substantially outperforms 16 state-of-the-art methods on LLaVA and Qwen2.5-VL, achieving a notable +6.06 percentage point improvement over the strongest baseline on the CoIN-Long-10 benchmark.
📝 Abstract
While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gradient conflict persists within each expert even when routing is maximally polarized. Moreover, activation-subspace protection can also fail because, under parameter-efficient fine-tuning, it entangles tasks due to a dimension-counting bound, and federated averaging (FedAvg) disrupts client-side orthogonality. To address this, we propose PRISM (Per-expert Routing-projection Interference-informed Subspace Method), which maintains a per-expert gradient subspace basis whose orthogonality is preserved under FedAvg and reinterprets MoE routing as a capacity allocator. Our results show that, on LLaVA-1.5-7B, LLaVA-1.5-13B, and Qwen2.5-VL-7B across CoIN-6 and CoIN-Long-10, PRISM outperforms sixteen the state of the art baselines in average accuracy. Compared to the best federated multimodal baseline, the performance margin increases from +3.23 pp on CoIN-6 to +6.06 pp on CoIN-Long-10.
Problem

Research questions and friction points this paper is trying to address.

federated multimodal continual learning
spurious isolation
MoE-LoRA
gradient conflict
activation-subspace entanglement
Innovation

Methods, ideas, or system contributions that make the work stand out.

federated multimodal continual learning
MoE-LoRA
gradient subspace orthogonality
task isolation
PRISM
🔎 Similar Papers
2024-10-04IEEE International Symposium on Network Computing and ApplicationsCitations: 3