DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address the inefficiency of full retraining for knowledge updating in pre-trained vision-language models (VLMs), this paper proposes a dual-modal collaborative editing framework. It is the first to reveal the heterogeneity between textual and visual pathways in VLMs regarding layer-wise edit sensitivity, and accordingly designs a cross-modal hierarchical editing mechanism alongside a text-guided learnable gating module to enable precise, parameter-level intervention at critical sensitive layers. The method is architecture-agnostic—compatible with diverse VLM backbones—requires no fine-tuning or auxiliary data. Evaluated on multiple benchmarks, it significantly outperforms state-of-the-art VLMs and transfer-based LLM editing approaches: knowledge correction accuracy improves by up to 18.7%, forgetting rate decreases by 42%, and original model capabilities are robustly preserved.

Technology Category

Application Category

📝 Abstract

Model editing aims to efficiently update a pre-trained model's knowledge without the need for time-consuming full retraining. While existing pioneering editing methods achieve promising results, they primarily focus on editing single-modal language models (LLMs). However, for vision-language models (VLMs), which involve multiple modalities, the role and impact of each modality on editing performance remain largely unexplored. To address this gap, we explore the impact of textual and visual modalities on model editing and find that: (1) textual and visual representations reach peak sensitivity at different layers, reflecting their varying importance; and (2) editing both modalities can efficiently update knowledge, but this comes at the cost of compromising the model's original capabilities. Based on our findings, we propose DualEdit, an editor that modifies both textual and visual modalities at their respective key layers. Additionally, we introduce a gating module within the more sensitive textual modality, allowing DualEdit to efficiently update new knowledge while preserving the model's original information. We evaluate DualEdit across multiple VLM backbones and benchmark datasets, demonstrating its superiority over state-of-the-art VLM editing baselines as well as adapted LLM editing methods on different evaluation metrics.

Problem

Research questions and friction points this paper is trying to address.

Explores impact of text and vision modalities on model editing

Proposes DualEdit to update knowledge while preserving original capabilities

Evaluates DualEdit's superiority over existing VLM and LLM editing methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Edits both textual and visual modalities

Uses key layers for each modality

Incorporates gating module for sensitive text

🔎 Similar Papers

Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit