REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment

📅 2026-02-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of conflicting external knowledge introduced by open-domain retrieval in knowledge-intensive visual question answering (VQA), where existing methods lack a general mechanism for conflict detection and resolution. To this end, we propose REAL, a novel framework that introduces the concept of a “Reasoning-Pivot”—an atomic knowledge unit in the reasoning chain that relies on external evidence—and leverages pivot alignment to identify and mitigate conflicts. We construct the REAL-VQA dataset and develop pivot-aware supervised fine-tuning (RPA-SFT) and pivot-guided decoding (RPGD) mechanisms, establishing a general paradigm for resolving knowledge conflicts. Experiments demonstrate that our approach significantly improves conflict discrimination accuracy and achieves state-of-the-art performance on multiple VQA benchmarks.

Technology Category

Application Category

📝 Abstract
Knowledge-intensive Visual Question Answering (KI-VQA) frequently suffers from severe knowledge conflicts caused by the inherent limitations of open-domain retrieval. However, existing paradigms face critical limitations due to the lack of generalizable conflict detection and intra-model constraint mechanisms to handle conflicting evidence. To address these challenges, we propose the REAL (Reasoning-Pivot Alignment) framework centered on the novel concept of the Reasoning-Pivot. Distinct from reasoning steps that prioritize internal self-derivation, a reasoning-pivot serves as an atomic unit (node or edge) in the reasoning chain that emphasizes knowledge linkage, and it typically relies on external evidence to complete the reasoning. Supported by our constructed REAL-VQA dataset, our approach integrates Reasoning-Pivot Aware SFT (RPA-SFT) to train a generalizable discriminator by aligning conflicts with pivot extraction, and employs Reasoning-Pivot Guided Decoding (RPGD), an intra-model decoding strategy that leverages these pivots for targeted conflict mitigation. Extensive experiments across diverse benchmarks demonstrate that REAL significantly enhances discrimination accuracy and achieves state-of-the-art performance, validating the effectiveness of our pivot-driven resolution paradigm.
Problem

Research questions and friction points this paper is trying to address.

Knowledge-intensive Visual Question Answering
knowledge conflicts
open-domain retrieval
conflict detection
evidence conflict
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reasoning-Pivot
Knowledge Conflict Resolution
Knowledge-Intensive VQA
Pivot-Guided Decoding
Supervised Fine-Tuning
K
Kai Ye
Zhejiang University, Hangzhou, China
X
Xianwei Mao
Zhejiang University, Hangzhou, China
Sheng Zhou
Sheng Zhou
Zhejiang University
Data Mining
Zirui Shao
Zirui Shao
Zhejiang University
Ye Mo
Ye Mo
Zhejiang University
L
Liangliang Liu
Alibaba Group, Hangzhou, China
H
Haikuan Huang
Alibaba Group, Hangzhou, China
Bin Li
Bin Li
Zhejiang University
Federated LearningDistributed OptimizationLLM
J
Jiajun Bu
Zhejiang University, Hangzhou, China