ReasonEdit: Editing Vision-Language Models using Human Reasoning

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for editing vision-language models struggle to correct complex reasoning errors without inadvertently altering other model behaviors. To address this challenge, this work proposes ReasonEdit, a novel editing paradigm that, for the first time, incorporates human-like reasoning explanations into vision-language model editing. ReasonEdit leverages a multimodal embedding mechanism comprising human reasoning encoding, a codebook-based memory store, and topological balancing to dynamically retrieve relevant knowledge during inference, enabling precise and generalizable edits. Evaluated across four state-of-the-art vision-language models and multiple reasoning-based visual question answering benchmarks, ReasonEdit substantially outperforms existing approaches, achieving state-of-the-art performance in model editing fidelity and robustness.

Technology Category

Application Category

📝 Abstract
Model editing aims to correct errors in large, pretrained models without altering unrelated behaviors. While some recent works have edited vision-language models (VLMs), no existing editors tackle reasoning-heavy tasks, which typically require humans and models to reason about images. We therefore propose ReasonEdit, the first VLM editor to let users explain their reasoning during editing, introducing a new, practical model editing setup. ReasonEdit continuously stores human reasoning in a codebook, and retrieves only relevant facts during inference using a novel topology-balanced multimodal embedding method inspired by network science. Across four VLMs on multiple rationale-based visual question answering datasets, ReasonEdit achieves state-of-the-art editing performance, ultimately showing that using human reasoning during editing greatly improves edit generalization.
Problem

Research questions and friction points this paper is trying to address.

vision-language models
model editing
reasoning
visual question answering
human reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

model editing
vision-language models
human reasoning
multimodal embedding
topology-balanced retrieval
🔎 Similar Papers
No similar papers found.
J
Jiaxing Qiu
University of Virginia, Charlottesville, VA, USA
K
Kaihua Hou
University of California, Berkeley, Berkeley, CA, USA
R
Roxana Daneshjou
Stanford University, Stanford, CA, USA
Ahmed M. Alaa
Ahmed M. Alaa
Assistant Professor, UC Berkeley and UCSF
Machine LearningArtificial IntelligenceCausal InferenceAI for MedicineHealthcare
T
Thomas Hartvigsen
University of Virginia, Charlottesville, VA, USA