🤖 AI Summary
This study addresses the frequent yet poorly understood failures of multimodal large language models (MLLMs) in interpreting visualizations. We propose the first taxonomy of visualization literacy barriers specific to MLLMs, grounded in visualization literacy theory and derived through open-ended human coding of 309 erroneous responses from four state-of-the-art models on a synthetically constructed benchmark, reVLAT. Our analysis reveals that while MLLMs perform adequately on simple charts, their reasoning capabilities degrade significantly on color-intensive or segmented visualizations, particularly exhibiting inconsistent comparative reasoning. Notably, we identify two machine-specific failure modes not captured by existing human-centered frameworks. These findings provide both theoretical grounding and practical design guidance for developing more reliable AI-powered visualization assistants.
📝 Abstract
Multimodal Large Language Models (MLLMs) are increasingly used to interpret visualizations, yet little is known about why they fail. We present the first systematic analysis of barriers to visualization literacy in MLLMs. Using the regenerated Visualization Literacy Assessment Test (reVLAT) benchmark with synthetic data, we open-coded 309 erroneous responses from four state-of-the-art models with a barrier-centric strategy adapted from human visualization literacy research. Our analysis yields a taxonomy of MLLM failures, revealing two machine-specific barriers that extend prior human-participation frameworks. Results show that models perform well on simple charts but struggle with color-intensive, segment-based visualizations, often failing to form consistent comparative reasoning. Our findings inform future evaluation and design of reliable AI-driven visualization assistants.