InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts

šŸ“… 2025-05-25
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Existing visual question answering (VQA) benchmarks inadequately evaluate multimodal large language models’ (MLLMs) comprehension of infographics—visualizations rich in design-driven elements such as icons, metaphors, and symbolic representations—due to the absence of semantically aligned, structurally divergent infographic–chart pairs and corresponding visual-element–grounded questions. To address this gap, we introduce InfographicQA, the first infographic-oriented multimodal VQA benchmark, comprising 5,642 paired infographics and conventional charts. We propose a visual-element–semantic–annotation–guided question generation strategy to ensure fine-grained alignment with design features. This paired design enables precise error analysis and ablation studies, revealing a critical weakness in MLLMs: across 20 state-of-the-art models, average performance drops by over 30% on infographic tasks, with metaphor-related questions exhibiting the lowest accuracy. The benchmark is publicly released to advance infographic-aware multimodal modeling.

Technology Category

Application Category

šŸ“ Abstract
Understanding infographic charts with design-driven visual elements (e.g., pictograms, icons) requires both visual recognition and reasoning, posing challenges for multimodal large language models (MLLMs). However, existing visual-question answering benchmarks fall short in evaluating these capabilities of MLLMs due to the lack of paired plain charts and visual-element-based questions. To bridge this gap, we introduce InfoChartQA, a benchmark for evaluating MLLMs on infographic chart understanding. It includes 5,642 pairs of infographic and plain charts, each sharing the same underlying data but differing in visual presentations. We further design visual-element-based questions to capture their unique visual designs and communicative intent. Evaluation of 20 MLLMs reveals a substantial performance decline on infographic charts, particularly for visual-element-based questions related to metaphors. The paired infographic and plain charts enable fine-grained error analysis and ablation studies, which highlight new opportunities for advancing MLLMs in infographic chart understanding. We release InfoChartQA at https://github.com/CoolDawnAnt/InfoChartQA.
Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs on infographic chart understanding
Addressing lack of visual-element-based QA benchmarks
Analyzing performance gaps in metaphor-related questions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces InfoChartQA benchmark for MLLMs
Pairs infographic and plain charts
Designs visual-element-based questions
šŸ”Ž Similar Papers
No similar papers found.