Uncovering Entity Identity Confusion in Multimodal Knowledge Editing

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses entity identity confusion (EIC) in multimodal knowledge editing—a phenomenon where textual queries about an original entity erroneously retrieve information associated with a newly edited entity. The study systematically identifies and analyzes EIC for the first time, revealing that existing methods conflate image-entity bindings with inter-entity relationships, thereby inducing shortcut learning in models. To diagnose and mitigate this issue, the authors introduce EC-Bench, a diagnostic benchmark enabling both behavioral analysis and internal mechanism probing. Building on these insights, they propose constraining edit operations during the image-entity processing stage. Experimental results demonstrate that this strategy substantially reduces EIC, offering principled guidance and an effective pathway toward faithful and controllable multimodal knowledge editing.

📝 Abstract

Multimodal knowledge editing (MKE) aims to correct the internal knowledge of large vision-language models after deployment, yet the behavioral patterns of post-edit models remain underexplored. In this paper, we identify a systemic failure mode in edited models, termed Entity Identity Confusion (EIC): edited models exhibit an absurd behavior where text-only queries about the original entity's identity unexpectedly return information about the new entity. To rigorously investigate EIC, we construct EC-Bench, a diagnostic benchmark that directly probes how image-entity bindings shift before and after editing. Our analysis reveals that EIC stems from existing methods failing to distinguish between Image-Entity (I-E) binding and Entity-Entity (E-E) relational knowledge in the model, causing models to overfit E-E associations as a shortcut: the image is still perceived as the original entity, with the new entity's name serving only as a spurious identity label. We further explore potential mitigation strategies, showing that constraining edits to the model's I-E processing stage encourages edits to act more faithfully on I-E binding, thereby substantially reducing EIC. Based on these findings, we discuss principled desiderata for faithful MKE and provide methodological guidance for future research.

Problem

Research questions and friction points this paper is trying to address.

Entity Identity Confusion

Multimodal Knowledge Editing

Vision-Language Models

Image-Entity Binding

Knowledge Editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entity Identity Confusion

Multimodal Knowledge Editing

Image-Entity Binding