CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the conflict between static parametric knowledge and dynamically retrieved information in knowledge-based visual question answering by proposing a training-free, conflict- and relevance-aware approach. For the first time, visual cues are integrated into conflict analysis through a vision-centric contextual conflict reasoning module, relevance-guided positional encoding compression, and an adaptive decoding mechanism, effectively identifying and mitigating knowledge inconsistencies. Key innovations include visual-semantic conflict modeling, relevance-weighted conflict scoring, and a corresponding decoding strategy, overcoming limitations of existing language-centric methods. The proposed method achieves state-of-the-art performance on the E-VQA, InfoSeek, and OK-VQA benchmarks, with absolute accuracy gains of 3.3% to 6.4%.

Technology Category

Application Category

📝 Abstract

Knowledge-based visual question answering (KB-VQA) demonstrates significant potential for handling knowledge-intensive tasks. However, conflicts arise between static parametric knowledge in vision language models (VLMs) and dynamically retrieved information due to the static model knowledge from pre-training. The outputs either ignore retrieved contexts or exhibit inconsistent integration with parametric knowledge, posing substantial challenges for KB-VQA. Current knowledge conflict mitigation methods primarily adapted from language-based approaches, focusing on context-level conflicts through engineered prompting strategies or context-aware decoding mechanisms. However, these methods neglect the critical role of visual information in conflicts and suffer from redundant retrieved contexts, which impair accurate conflict identification and effective mitigation. To address these limitations, we propose \textbf{CC-VQA}: a novel training-free, conflict- and correlation-aware method for KB-VQA. Our method comprises two core components: (1) Vision-Centric Contextual Conflict Reasoning, which performs visual-semantic conflict analysis across internal and external knowledge contexts; and (2) Correlation-Guided Encoding and Decoding, featuring positional encoding compression for low-correlation statements and adaptive decoding using correlation-weighted conflict scoring. Extensive evaluations on E-VQA, InfoSeek, and OK-VQA benchmarks demonstrate that CC-VQA achieves state-of-the-art performance, yielding absolute accuracy improvements of 3.3\% to 6.4\% compared to existing methods. Code is available at https://github.com/cqu-student/CC-VQA.

Problem

Research questions and friction points this paper is trying to address.

Knowledge Conflict

Visual Question Answering

Vision-Language Models

Retrieved Context

KB-VQA

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-Based VQA

Knowledge Conflict Mitigation

Vision-Centric Reasoning