Losing the Plot: How VLM responses degrade on imperfect charts

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world charts often contain noise, occlusions, and logical inconsistencies, causing existing vision-language models (VLMs) to generate hallucinations—including numerical fabrication, trend misinterpretation, and entity confusion—while maintaining high confidence despite performance degradation. This work systematically characterizes VLMs’ vulnerabilities in distorted chart reasoning and introduces CHART NOISe, the first multimodal benchmark integrating chart corruption, occlusion, and reverse prompt inconsistency (i.e., logical contradictions). We design a multiple-choice evaluation framework inspired by Korea’s CSAT English exam format and propose a quantitative metric for reverse prompt inconsistency. Validated on state-of-the-art models—including ChatGPT-4o, Claude Sonnet 4, and Gemini 2.5 Pro—CHART NOISe effectively exposes hallucination and overconfidence phenomena. It establishes a reproducible, rigorous assessment paradigm for robust chart understanding.

Technology Category

Application Category

📝 Abstract
Vision language models (VLMs) show strong results on chart understanding, yet existing benchmarks assume clean figures and fact based queries. Real world charts often contain distortions and demand reasoning beyond simple matching. We evaluate ChatGPT 4o, Claude Sonnet 4, and Gemini 2.5 Pro, finding sharp performance drops under corruption or occlusion, with hallucinations such as value fabrication, trend misinterpretation, and entity confusion becoming more frequent. Models remain overconfident in degraded settings, generating plausible but unsupported explanations. To address this gap, we introduce CHART NOISe(Chart Hallucinations, Answers, and Reasoning Testing on Noisy and Occluded Input Selections), a dataset combining chart corruptions, occlusions, and exam style multiple choice questions inspired by Korea's CSAT English section. A key innovation is prompt reverse inconsistency, where models contradict themselves when asked to confirm versus deny the same statement. Our contributions are threefold: (1) benchmarking state of the art VLMs, exposing systematic vulnerabilities in chart reasoning; (2) releasing CHART NOISe, the first dataset unifying corruption, occlusion, and reverse inconsistency; and (3) proposing baseline mitigation strategies such as quality filtering and occlusion detection. Together, these efforts establish a rigorous testbed for advancing robustness and reliability in chart understanding.
Problem

Research questions and friction points this paper is trying to address.

Evaluating VLM performance degradation on corrupted or occluded charts
Addressing hallucinations like value fabrication and trend misinterpretation
Establishing testbed for robustness in real-world chart understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing CHART NOISe dataset with corruptions and occlusions
Proposing baseline mitigation strategies like quality filtering
Establishing a testbed for robustness in chart understanding
🔎 Similar Papers
No similar papers found.