Uncovering Entity Identity Confusion in Multimodal Knowledge Editing

πŸ“… 2026-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

174K/year
πŸ€– AI Summary
This work addresses entity identity confusion (EIC) in multimodal knowledge editingβ€”a phenomenon where textual queries about an original entity erroneously retrieve information associated with a newly edited entity. The study systematically identifies and analyzes EIC for the first time, revealing that existing methods conflate image-entity bindings with inter-entity relationships, thereby inducing shortcut learning in models. To diagnose and mitigate this issue, the authors introduce EC-Bench, a diagnostic benchmark enabling both behavioral analysis and internal mechanism probing. Building on these insights, they propose constraining edit operations during the image-entity processing stage. Experimental results demonstrate that this strategy substantially reduces EIC, offering principled guidance and an effective pathway toward faithful and controllable multimodal knowledge editing.
πŸ“ Abstract
Multimodal knowledge editing (MKE) aims to correct the internal knowledge of large vision-language models after deployment, yet the behavioral patterns of post-edit models remain underexplored. In this paper, we identify a systemic failure mode in edited models, termed Entity Identity Confusion (EIC): edited models exhibit an absurd behavior where text-only queries about the original entity's identity unexpectedly return information about the new entity. To rigorously investigate EIC, we construct EC-Bench, a diagnostic benchmark that directly probes how image-entity bindings shift before and after editing. Our analysis reveals that EIC stems from existing methods failing to distinguish between Image-Entity (I-E) binding and Entity-Entity (E-E) relational knowledge in the model, causing models to overfit E-E associations as a shortcut: the image is still perceived as the original entity, with the new entity's name serving only as a spurious identity label. We further explore potential mitigation strategies, showing that constraining edits to the model's I-E processing stage encourages edits to act more faithfully on I-E binding, thereby substantially reducing EIC. Based on these findings, we discuss principled desiderata for faithful MKE and provide methodological guidance for future research.
Problem

Research questions and friction points this paper is trying to address.

Entity Identity Confusion
Multimodal Knowledge Editing
Vision-Language Models
Image-Entity Binding
Knowledge Editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entity Identity Confusion
Multimodal Knowledge Editing
Image-Entity Binding
EC-Bench
Vision-Language Models
πŸ”Ž Similar Papers
No similar papers found.
S
Shu Wu
New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences
Xiaotian Ye
Xiaotian Ye
Beijing University of Posts and Telecommunications
Natural Language ProcessingKnowledge RepresentationLarge Language Models
X
Xinyu Mou
New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
D
Dongsheng Liu
New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences; School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences
X
Xiaohan Wang
Huazhong University of Science and Technology
Mengqi Zhang
Mengqi Zhang
Shandong University
Large Language ModelsData MiningKnowledge Representation Learning