🤖 AI Summary
This study addresses the challenge that current vision-language models struggle to detect misleading data representations in chart captions caused by subtle reasoning errors or flaws in visualization design. To this end, the authors construct the first fine-grained benchmark that combines real-world visualizations with synthetically generated misleading captions, and introduce a novel taxonomy categorizing misleading content into reasoning errors and visualization design errors, enabling precise attribution of deception sources. Systematic evaluation of various open-source and commercial vision-language models reveals that these models are significantly better at identifying visualization design flaws than reasoning-based misinformation, yet they frequently misclassify accurate charts as deceptive. This work bridges the gap between coarse-grained deception detection and fine-grained error-type identification, highlighting limitations in complex semantic understanding.
📝 Abstract
Visualizations help communicate data insights, but deceptive data representations can distort their interpretation and propagate misinformation. While recent Vision Language Models (VLMs) perform well on many chart understanding tasks, their ability to detect misleading visualizations, especially when deception arises from subtle reasoning errors in captions, remains poorly understood. Here, we evaluate VLMs on misleading visualization-caption pairs grounded in a fine-grained taxonomy of reasoning errors (e.g., Cherry-picking, Causal inference) and visualization design errors (e.g., Truncated axis, Dual axis, inappropriate encodings). To this end, we develop a benchmark that combines real-world visualization with human-authored, curated misleading captions designed to elicit specific reasoning and visualization error types, enabling controlled analysis across error categories and modalities of misleadingness. Evaluating many commercial and open-source VLMs, we find that models detect visual design errors substantially more reliably than reasoning-based misinformation, and frequently misclassify non-misleading visualizations as deceptive. Overall, our work fills a gap between coarse detection of misleading content and the attribution of the specific reasoning or visualization errors that give rise to it.