Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts

πŸ“… 2024-11-23
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenges of continual error correction, outdated information updating, and novel data integration in vision-language large models (VLLMs) during lifelong knowledge editing, this paper proposes LiveEditβ€”the first lifelong editing framework tailored for VLLMs. Methodologically, it introduces: (1) a low-rank expert generator enabling parameter-efficient, dynamically scalable editing; (2) a synergistic mechanism combining vision-semantic hard filtering with text-semantic soft routing to ensure cross-modal editing precision; and (3) the first dedicated benchmark for lifelong editing of VLLMs. Extensive experiments demonstrate that LiveEdit significantly improves editing accuracy over baselines. Ablation studies validate the effectiveness of each component, while further analyses confirm its strong robustness and cross-task generalization capability.

Technology Category

Application Category

πŸ“ Abstract
Model editing aims to correct inaccurate knowledge, update outdated information, and incorporate new data into Large Language Models (LLMs) without the need for retraining. This task poses challenges in lifelong scenarios where edits must be continuously applied for real-world applications. While some editors demonstrate strong robustness for lifelong editing in pure LLMs, Vision LLMs (VLLMs), which incorporate an additional vision modality, are not directly adaptable to existing LLM editors. In this paper, we propose LiveEdit, a LIfelong Vision language modEl Edit to bridge the gap between lifelong LLM editing and VLLMs. We begin by training an editing expert generator to independently produce low-rank experts for each editing instance, with the goal of correcting the relevant responses of the VLLM. A hard filtering mechanism is developed to utilize visual semantic knowledge, thereby coarsely eliminating visually irrelevant experts for input queries during the inference stage of the post-edited model. Finally, to integrate visually relevant experts, we introduce a soft routing mechanism based on textual semantic relevance to achieve multi-expert fusion. For evaluation, we establish a benchmark for lifelong VLLM editing. Extensive experiments demonstrate that LiveEdit offers significant advantages in lifelong VLLM editing scenarios. Further experiments validate the rationality and effectiveness of each module design in LiveEdit.
Problem

Research questions and friction points this paper is trying to address.

Corrects inaccurate knowledge in Vision Language Models
Updates outdated information without retraining
Incorporates new data for lifelong editing scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank mixture-of-experts for VLLM editing
Hard filtering mechanism using visual semantics
Soft routing for multi-expert fusion
Qizhou Chen
Qizhou Chen
ECNU
Natural Language ProcessingComputer Vision
Chengyu Wang
Chengyu Wang
Alibaba Group
Natural Language ProcessingLarge Language ModelMulti-modal Learning
D
Dakan Wang
Exacity Inc., Shanghai, China
Taolin Zhang
Taolin Zhang
Hefei University of Technology
LLMVLLMDeep Learning
W
Wangyue Li
East China Normal University, Shanghai China
X
Xiaofeng He
East China Normal University, Shanghai China