Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal evaluation benchmarks overlook realistic scenarios involving multimodal knowledge conflicts, particularly lacking systematic investigation of context-memory conflicts and inter-context conflicts. To address this gap, we introduce MMKC-Bench—the first dedicated benchmark for evaluating knowledge conflict handling in large multimodal models (LMMs)—comprising 1,573 knowledge instances and 3,381 images. We propose a novel multi-stage data construction paradigm integrating automated web crawling, rule-based generation, and rigorous human verification, and formally define two conflict types. Our evaluation framework jointly assesses conflict detection capability and model behavioral responses. Extensive experiments across three major LMM families reveal that while models exhibit basic conflict identification ability, they strongly favor internally parameterized knowledge over externally retrieved evidence—exposing critical limitations in retrieval-augmented generation (RAG) integration. MMKC-Bench is publicly released to advance research on trustworthy multimodal reasoning.

Technology Category

Application Category

📝 Abstract
Large Multimodal Models(LMMs) face notable challenges when encountering multimodal knowledge conflicts, particularly under retrieval-augmented generation(RAG) frameworks where the contextual information from external sources may contradict the model's internal parametric knowledge, leading to unreliable outputs. However, existing benchmarks fail to reflect such realistic conflict scenarios. Most focus solely on intra-memory conflicts, while context-memory and inter-context conflicts remain largely investigated. Furthermore, commonly used factual knowledge-based evaluations are often overlooked, and existing datasets lack a thorough investigation into conflict detection capabilities. To bridge this gap, we propose MMKC-Bench, a benchmark designed to evaluate factual knowledge conflicts in both context-memory and inter-context scenarios. MMKC-Bench encompasses three types of multimodal knowledge conflicts and includes 1,573 knowledge instances and 3,381 images across 23 broad types, collected through automated pipelines with human verification. We evaluate three representative series of LMMs on both model behavior analysis and conflict detection tasks. Our findings show that while current LMMs are capable of recognizing knowledge conflicts, they tend to favor internal parametric knowledge over external evidence. We hope MMKC-Bench will foster further research in multimodal knowledge conflict and enhance the development of multimodal RAG systems. The source code is available at https://github.com/MLLMKCBENCH/MLLMKC.
Problem

Research questions and friction points this paper is trying to address.

Evaluating multimodal knowledge conflicts in LMMs
Addressing gaps in context-memory and inter-context conflict benchmarks
Assessing conflict detection capabilities in retrieval-augmented generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes MMKC-Bench for knowledge conflict evaluation
Includes 1,573 knowledge instances and 3,381 images
Evaluates LMMs on behavior and conflict detection
🔎 Similar Papers
Y
Yifan Jia
Joint SDU-NTU Centre for Artificial Intelligence Research&School of Software , Shandong University
K
Kailin Jiang
University of Science and Technology of China
Y
Yuyang Liang
Joint SDU-NTU Centre for Artificial Intelligence Research&School of Software , Shandong University
Qihan Ren
Qihan Ren
Shanghai Jiao Tong University
Explainable AIMachine LearningComputer VisionNatural Language Processing
Yi Xin
Yi Xin
California Institute of Technology
Industrial OrganizationEconometrics
R
Rui Yang
Joint SDU-NTU Centre for Artificial Intelligence Research&School of Software , Shandong University
F
Fenze Feng
Joint SDU-NTU Centre for Artificial Intelligence Research&School of Software , Shandong University
Mingcai Chen
Mingcai Chen
Nanjing University of Posts and Telecommunications
weakly supervised learningopen-environment machine learning
H
Hengyang Lu
Jiangnan University
H
Haozhe Wang
The Hong Kong University of Science and Technology
Xiaoye Qu
Xiaoye Qu
Shanghai AI Lab
D
Dongrui Liu
Shanghai AI Laboratory
Lizhen Cui
Lizhen Cui
Shandong University
DatabasesBig DataArtificial IntelligenceData MiningCloud computing
Yuntao Du
Yuntao Du
Purdue University
Privacy