ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing VQA datasets lack complex questions requiring multi-hop reasoning grounded in structured encyclopedic knowledge, thus failing to adequately evaluate models’ deep, external-knowledge-driven reasoning capabilities. Method: We introduce K-VQA, the first structured-knowledge-enhanced VQA benchmark explicitly designed for multi-hop reasoning. We propose an automated construction framework that jointly performs image semantic parsing and knowledge graph path mining to generate high-quality, scalable multi-hop visual question-answer pairs. Contribution/Results: K-VQA exceeds comparable benchmarks in scale by an order of magnitude and significantly increases task difficulty. Experiments show that state-of-the-art VQA models suffer substantial performance degradation on K-VQA—achieving average accuracy below 40%—validating its rigor as a challenging evaluation benchmark. K-VQA establishes a new paradigm and provides a critical resource for advancing knowledge-aware visual-language reasoning research.

Technology Category

Application Category

📝 Abstract

In this paper, we propose a new dataset, ReasonVQA, for the Visual Question Answering (VQA) task. Our dataset is automatically integrated with structured encyclopedic knowledge and constructed using a low-cost framework, which is capable of generating complex, multi-hop questions. We evaluated state-of-the-art VQA models on ReasonVQA, and the empirical results demonstrate that ReasonVQA poses significant challenges to these models, highlighting its potential for benchmarking and advancing the field of VQA. Additionally, our dataset can be easily scaled with respect to input images; the current version surpasses the largest existing datasets requiring external knowledge by more than an order of magnitude.

Problem

Research questions and friction points this paper is trying to address.

Proposes ReasonVQA dataset for multi-hop VQA tasks

Integrates structured knowledge to generate complex questions

Challenges state-of-the-art VQA models for benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically integrates structured encyclopedic knowledge

Low-cost framework generates multi-hop questions

Scalable dataset surpasses existing knowledge-based VQA datasets

🔎 Similar Papers

No similar papers found.