🤖 AI Summary
Existing knowledge editing methods predominantly perform unimodal, isolated updates, neglecting the intrinsic multimodal coupling of Large Vision-Language Models (LVLMs) and the continuity of knowledge evolution—leading to insufficient cross-modal collaborative editing and long-term consistency. To address this, we propose the first framework enabling continual compositional knowledge editing. Our method introduces a hybrid internal-external editing architecture: two external memory banks support cross-modal evidence retrieval; dual LoRA adapters enable modality-decoupled parameter updates; and an activatable brain-inspired knowledge connector dynamically regulates vision-language fusion and selective reasoning. Evaluated on multimodal question answering, our approach significantly surpasses state-of-the-art methods, achieving high editing accuracy, strong historical knowledge retention, and robust cross-turn knowledge accumulation. This work establishes a new benchmark for continual compositional knowledge editing in LVLMs.
📝 Abstract
The dynamic nature of information necessitates continuously updating large vision-language models (LVLMs). While recent knowledge editing techniques hint at promising directions, they often focus on editing a single modality (vision or language) in isolation. This prevalent practice neglects the inherent multimodality of LVLMs and the continuous nature of knowledge updates, potentially leading to suboptimal editing outcomes when considering the interplay between modalities and the need for ongoing knowledge refinement. To address these limitations, we propose MemEIC, a novel method for Continual and Compositional Knowledge Editing (CCKE) in LVLMs. MemEIC enables compositional editing of both visual and textual knowledge sequentially. Our approach employs a hybrid external-internal editor featuring a dual external memory for cross-modal evidence retrieval and dual LoRA adapters that facilitate disentangled parameter updates for each modality. A key component is a brain-inspired knowledge connector, activated selectively for compositional reasoning, that integrates information across different modalities. Experiments demonstrate that MemEIC significantly improves performance on complex multimodal questions and effectively preserves prior edits, setting a new benchmark for CCKE in LVLMs.