MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

📅 2024-10-18

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

179K/year

🤖 AI Summary

Existing chart understanding benchmarks focus exclusively on single-chart tasks, failing to assess models’ cross-chart information integration and multi-hop reasoning capabilities in realistic multi-chart scenarios. To address this gap, we introduce ChartFusion—the first visual-language benchmark dedicated to multi-chart joint understanding. ChartFusion systematically defines and covers four categories of multi-hop, cross-chart tasks: direct question answering, parallel question answering, comparative reasoning, and temporal reasoning. It is constructed from manually curated multi-chart–question pairs and employs a multi-dimensional evaluation protocol for standardized assessment of state-of-the-art multimodal large language models (MLLMs). Experimental results reveal that current MLLMs underperform humans by 32.7% in average accuracy on ChartFusion, clearly exposing their limitations in cross-chart reasoning. This work establishes a reproducible benchmark and provides concrete directions for advancing AI-driven multi-chart reasoning research.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have demonstrated impressive abilities across various tasks, including visual question answering and chart comprehension, yet existing benchmarks for chart-related tasks fall short in capturing the complexity of real-world multi-chart scenarios. Current benchmarks primarily focus on single-chart tasks, neglecting the multi-hop reasoning required to extract and integrate information from multiple charts, which is essential in practical applications. To fill this gap, we introduce MultiChartQA, a benchmark that evaluates MLLMs' capabilities in four key areas: direct question answering, parallel question answering, comparative reasoning, and sequential reasoning. Our evaluation of a wide range of MLLMs reveals significant performance gaps compared to humans. These results highlight the challenges in multi-chart comprehension and the potential of MultiChartQA to drive advancements in this field. Our code and data are available at https://github.com/Zivenzhu/Multi-chart-QA

Problem

Research questions and friction points this paper is trying to address.

Evaluates MLLMs on multi-chart comprehension

Introduces MultiChartQA benchmark

Addresses multi-hop reasoning in charts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarks multi-chart scenarios

Evaluates multimodal language models

Focuses on multi-hop reasoning

🔎 Similar Papers

No similar papers found.