Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Unified multimodal generative models (UMGMs) suffer from both intra-modal and inter-modal catastrophic forgetting during continual learning, with the latter long overlooked. This paper is the first to systematically identify and quantify inter-modal forgetting. We propose the lightweight Modality-Decoupled Expert Architecture (MDEA), which mitigates cross-modal interference via modality-specific experts, gradient-decoupled parameter updates, and parameter isolation. To preserve pretrained capabilities, MDEA incorporates noise-free knowledge distillation. Unlike prior approaches, it requires no input perturbation and scales efficiently to new modalities and tasks. Evaluated on multiple multimodal continual learning benchmarks, MDEA significantly alleviates both intra- and inter-modal forgetting, consistently outperforming state-of-the-art methods across all metrics. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Unified Multimodal Generative Models (UMGMs) unify visual understanding and image generation within a single autoregressive framework. However, their ability to continually learn new tasks is severely hindered by catastrophic forgetting, both within a modality (intra-modal) and across modalities (inter-modal). While intra-modal forgetting has been studied in prior continual learning (CL) work, inter-modal forgetting remains largely unexplored. In this paper, we identify and empirically validate this phenomenon in UMGMs and provide a theoretical explanation rooted in gradient conflict between modalities. To address both intra- and inter-modal forgetting, we propose Modality-Decoupled Experts (MoDE), a lightweight and scalable architecture that isolates modality-specific updates to mitigate the gradient conflict and leverages knowledge distillation to prevent catastrophic forgetting and preserve pre-trained capabilities. Unlike previous CL methods that remain modality-coupled and suffer from modality gradient conflict, MoDE explicitly decouples modalities to prevent interference. Experiments across diverse benchmarks demonstrate that MoDE significantly mitigates both inter- and intra-modal forgetting, outperforming prior CL baselines in unified multimodal generation settings. Codes will be publicly available: https://github.com/Christina200/MoDE-official.git

Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in unified multimodal models

Mitigates intra-modal and inter-modal forgetting in continual learning

Proposes modality-decoupled architecture to prevent gradient conflicts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples modalities to prevent gradient conflict

Uses knowledge distillation to preserve pre-trained capabilities

Isolates modality-specific updates in lightweight architecture

🔎 Similar Papers

ModalPrompt:Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models

2024-10-08arXiv.orgCitations: 15