EncQA: Benchmarking Vision-Language Models on Visual Encodings for Charts

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Current multimodal vision-language models (VLMs) lack systematic evaluation of fine-grained visual encoding reasoning capabilities in chart understanding, particularly regarding coverage across encoding channels and analytical task dimensions. Method: We introduce EncQA—the first chart understanding benchmark grounded in visualization theory—comprising six visual encoding types and eight analytical task categories, instantiated with 2,076 synthetically generated question-answer pairs for fine-grained assessment. Contribution/Results: Evaluating nine state-of-the-art VLMs reveals that performance does not monotonically improve with model scale and exhibits pronounced task- and encoding-specific bottlenecks. This work is the first to deeply integrate foundational visualization principles into VLM evaluation, exposing fundamental limitations of scaling-only strategies. It provides empirical evidence guiding targeted architectural improvements and training optimizations for chart comprehension.

Technology Category

Application Category

📝 Abstract

Multimodal vision-language models (VLMs) continue to achieve ever-improving scores on chart understanding benchmarks. Yet, we find that this progress does not fully capture the breadth of visual reasoning capabilities essential for interpreting charts. We introduce EncQA, a novel benchmark informed by the visualization literature, designed to provide systematic coverage of visual encodings and analytic tasks that are crucial for chart understanding. EncQA provides 2,076 synthetic question-answer pairs, enabling balanced coverage of six visual encoding channels (position, length, area, color quantitative, color nominal, and shape) and eight tasks (find extrema, retrieve value, find anomaly, filter values, compute derived value exact, compute derived value relative, correlate values, and correlate values relative). Our evaluation of 9 state-of-the-art VLMs reveals that performance varies significantly across encodings within the same task, as well as across tasks. Contrary to expectations, we observe that performance does not improve with model size for many task-encoding pairs. Our results suggest that advancing chart understanding requires targeted strategies addressing specific visual reasoning gaps, rather than solely scaling up model or dataset size.

Problem

Research questions and friction points this paper is trying to address.

Evaluating VLMs on chart visual encoding comprehension

Assessing performance gaps in chart analytic tasks

Analyzing model size impact on visual reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces EncQA benchmark for visual encodings

Evaluates 9 VLMs on diverse encoding-task pairs

Highlights need for targeted visual reasoning strategies

🔎 Similar Papers

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding