ChartHal: A Fine-grained Framework Evaluating Hallucination of Large Vision Language Models in Chart Understanding

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Large Vision-Language Models (LVLMs) suffer from severe hallucination in chart understanding—especially under conditions of missing information or question-chart contradictions—yet lack fine-grained evaluation frameworks and high-quality benchmark datasets. To address this, we propose the first fine-grained hallucination taxonomy specifically for chart understanding, construct ChartHal—a high-fidelity benchmark comprising 1,062 expert-validated chart-question-answer triples—and design evaluation tasks covering diverse hallucination scenarios. Empirical evaluation reveals that state-of-the-art LVLMs—including GPT-5 and o4-mini—achieve only 34.46% and 22.79% accuracy on ChartHal, starkly exposing their reliability limitations. This work establishes a systematic foundation for hallucination modeling, assessment, and mitigation in chart understanding, providing both a rigorous benchmark and a methodological framework to advance trustworthy multimodal reasoning.

Technology Category

Application Category

📝 Abstract

Large Vision-Language Models (LVLMs) have recently demonstrated remarkable progress, yet hallucination remains a critical barrier, particularly in chart understanding, which requires sophisticated perceptual and cognitive abilities as well as rigorous factual accuracy. While prior work has investigated hallucinations and chart comprehension independently, their intersection remains largely unexplored. To address this gap, we present ChartHal, a benchmark that features a fine-grained taxonomy of hallucination scenarios in chart understanding, along with a human-validated dataset of 1,062 samples. Our evaluation shows that state-of-the-art LVLMs suffer from severe hallucinations on ChartHal, including proprietary models such as GPT-5 and o4-mini, which achieve only 34.46% and 22.79% accuracy, respectively. Further analysis reveals that questions involving information absent from or contradictory to charts are especially likely to trigger hallucinations, underscoring the urgent need for more robust mitigation strategies. Code and data are available at https://github.com/ymcui/ChartHal .

Problem

Research questions and friction points this paper is trying to address.

Evaluating hallucination issues in Large Vision-Language Models for chart understanding

Addressing the lack of fine-grained taxonomy for chart-related hallucination scenarios

Assessing LVLM performance on information absent from or contradictory to charts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained taxonomy for chart hallucination scenarios

Human-validated dataset with 1,062 chart samples

Evaluation framework revealing LVLM hallucinations on charts

🔎 Similar Papers

No similar papers found.