MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

📅 2024-10-18
🏛️ arXiv.org
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
Existing chart understanding benchmarks focus exclusively on single-chart tasks, failing to assess models’ cross-chart information integration and multi-hop reasoning capabilities in realistic multi-chart scenarios. To address this gap, we introduce ChartFusion—the first visual-language benchmark dedicated to multi-chart joint understanding. ChartFusion systematically defines and covers four categories of multi-hop, cross-chart tasks: direct question answering, parallel question answering, comparative reasoning, and temporal reasoning. It is constructed from manually curated multi-chart–question pairs and employs a multi-dimensional evaluation protocol for standardized assessment of state-of-the-art multimodal large language models (MLLMs). Experimental results reveal that current MLLMs underperform humans by 32.7% in average accuracy on ChartFusion, clearly exposing their limitations in cross-chart reasoning. This work establishes a reproducible benchmark and provides concrete directions for advancing AI-driven multi-chart reasoning research.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have demonstrated impressive abilities across various tasks, including visual question answering and chart comprehension, yet existing benchmarks for chart-related tasks fall short in capturing the complexity of real-world multi-chart scenarios. Current benchmarks primarily focus on single-chart tasks, neglecting the multi-hop reasoning required to extract and integrate information from multiple charts, which is essential in practical applications. To fill this gap, we introduce MultiChartQA, a benchmark that evaluates MLLMs' capabilities in four key areas: direct question answering, parallel question answering, comparative reasoning, and sequential reasoning. Our evaluation of a wide range of MLLMs reveals significant performance gaps compared to humans. These results highlight the challenges in multi-chart comprehension and the potential of MultiChartQA to drive advancements in this field. Our code and data are available at https://github.com/Zivenzhu/Multi-chart-QA
Problem

Research questions and friction points this paper is trying to address.

Evaluates MLLMs on multi-chart comprehension
Introduces MultiChartQA benchmark
Addresses multi-hop reasoning in charts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarks multi-chart scenarios
Evaluates multimodal language models
Focuses on multi-hop reasoning
🔎 Similar Papers
No similar papers found.