Hybrid-DMKG: A Hybrid Reasoning Framework over Dynamic Multimodal Knowledge Graphs for Multimodal Multihop QA with Knowledge Editing

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing MKE benchmarks evaluate only final answer correctness, neglecting intermediate reasoning quality and robustness to visual rephrasing. This work introduces MMQAKE, the first knowledge editing evaluation benchmark for multimodal multi-hop question answering, which uniquely incorporates dual assessment of intermediate reasoning step accuracy and robustness to visual rephrasing. We propose a novel dual-path hybrid reasoning framework grounded in a dynamic multimodal knowledge graph. It integrates LLM-driven question decomposition, multimodal retrieval for localizing and updating factual knowledge, relational link prediction, and RAG-enhanced parallel inference with vision-language models (VLMs), unified via a decision module for evidence aggregation. Experiments on MMQAKE demonstrate significant improvements: +12.3% in multi-hop QA accuracy and a 41.7% reduction in performance degradation under visual rephrasing—validating the framework’s effectiveness in complex reasoning and dynamic knowledge updating scenarios.

Technology Category

Application Category

📝 Abstract
Multimodal Knowledge Editing (MKE) extends traditional knowledge editing to settings involving both textual and visual modalities. However, existing MKE benchmarks primarily assess final answer correctness while neglecting the quality of intermediate reasoning and robustness to visually rephrased inputs. To address this limitation, we introduce MMQAKE, the first benchmark for multimodal multihop question answering with knowledge editing. MMQAKE evaluates (1) a model's ability to reason over 2-5-hop factual chains that span both text and images, including performance at each intermediate step, and (2) robustness to visually rephrased inputs in multihop questions. Our evaluation shows that current MKE methods often struggle to consistently update and reason over multimodal reasoning chains after knowledge edits. To overcome these challenges, we propose Hybrid-DMKG, a hybrid reasoning framework built on a dynamic multimodal knowledge graph (DMKG) to enable accurate multihop reasoning over updated multimodal knowledge. Hybrid-DMKG first uses a large language model to decompose multimodal multihop questions into sequential sub-questions, then applies a multimodal retrieval model to locate updated facts by jointly encoding each sub-question with candidate entities and their associated images. For answer inference, a hybrid reasoning module operates over the DMKG via two parallel paths: (1) relation linking prediction, and (2) RAG reasoning with large vision-language models. A decision module aggregates evidence from both paths to select the most credible answer. Experimental results on MMQAKE show that Hybrid-DMKG significantly outperforms existing MKE approaches, achieving higher accuracy and improved robustness to knowledge updates.
Problem

Research questions and friction points this paper is trying to address.

Develops a benchmark for multimodal multihop QA with knowledge editing
Evaluates reasoning over multimodal chains and robustness to visual rephrasing
Proposes a hybrid reasoning framework for accurate multihop reasoning after edits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid reasoning framework with dynamic multimodal knowledge graph
Decomposes questions into sub-questions using large language model
Combines relation linking and RAG reasoning for answer inference
🔎 Similar Papers
No similar papers found.
Li Yuan
Li Yuan
Research Associate, University of Science & Technology of China (USTC)
Antibiotic resistanceWastewater treatmentEnvironmental bioremediationAnaerobic digestionFate of organic pollutants
Q
Qingfei Huang
School of Software Engineering, South China University of Technology, Guangzhou, China
B
Bingshan Zhu
School of Big Data and Artificial Intelligence, Guangdong University of Finance & Economics
Y
Yi Cai
School of Software Engineering, South China University of Technology, Guangzhou, China
Qingbao Huang
Qingbao Huang
Guangxi University
AI
Changmeng Zheng
Changmeng Zheng
Research Assistant Professor, The Hong Kong Polytechnic University
Natural Language ProcessingSocial Media Text MiningMultimodal Data Analysis
Z
Zikun Deng
School of Software Engineering, South China University of Technology, Guangzhou, China
T
Tao Wang
Department of Biostatistics & Health Informatics, King’s College London, London, United Kingdom