PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses a critical flaw in existing federated multimodal continual learning approaches, which erroneously assume that Mixture-of-Experts (MoE) routing can fully isolate task-specific knowledge, thereby leading to intra-expert gradient conflicts and catastrophic forgetting. To remedy this, the authors propose PRISM, the first method to explicitly expose this assumption’s inadequacy and introduce interference-aware gradient subspace bases. PRISM reframes MoE routing as a capacity allocator and enforces explicit orthogonality among task subspaces during both parameter-efficient fine-tuning—implemented via MoE-LoRA—and federated averaging. Extensive experiments demonstrate that PRISM substantially outperforms 16 state-of-the-art methods on LLaVA and Qwen2.5-VL, achieving a notable +6.06 percentage point improvement over the strongest baseline on the CoIN-Long-10 benchmark.

📝 Abstract

While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gradient conflict persists within each expert even when routing is maximally polarized. Moreover, activation-subspace protection can also fail because, under parameter-efficient fine-tuning, it entangles tasks due to a dimension-counting bound, and federated averaging (FedAvg) disrupts client-side orthogonality. To address this, we propose PRISM (Per-expert Routing-projection Interference-informed Subspace Method), which maintains a per-expert gradient subspace basis whose orthogonality is preserved under FedAvg and reinterprets MoE routing as a capacity allocator. Our results show that, on LLaVA-1.5-7B, LLaVA-1.5-13B, and Qwen2.5-VL-7B across CoIN-6 and CoIN-Long-10, PRISM outperforms sixteen the state of the art baselines in average accuracy. Compared to the best federated multimodal baseline, the performance margin increases from +3.23 pp on CoIN-6 to +6.06 pp on CoIN-Long-10.

Problem

Research questions and friction points this paper is trying to address.

federated multimodal continual learning

spurious isolation

MoE-LoRA

gradient conflict

activation-subspace entanglement

Innovation

Methods, ideas, or system contributions that make the work stand out.

federated multimodal continual learning

MoE-LoRA

gradient subspace orthogonality