ChartHal: A Fine-grained Framework Evaluating Hallucination of Large Vision Language Models in Chart Understanding

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large Vision-Language Models (LVLMs) suffer from severe hallucination in chart understanding—especially under conditions of missing information or question-chart contradictions—yet lack fine-grained evaluation frameworks and high-quality benchmark datasets. To address this, we propose the first fine-grained hallucination taxonomy specifically for chart understanding, construct ChartHal—a high-fidelity benchmark comprising 1,062 expert-validated chart-question-answer triples—and design evaluation tasks covering diverse hallucination scenarios. Empirical evaluation reveals that state-of-the-art LVLMs—including GPT-5 and o4-mini—achieve only 34.46% and 22.79% accuracy on ChartHal, starkly exposing their reliability limitations. This work establishes a systematic foundation for hallucination modeling, assessment, and mitigation in chart understanding, providing both a rigorous benchmark and a methodological framework to advance trustworthy multimodal reasoning.

Technology Category

Application Category

📝 Abstract
Large Vision-Language Models (LVLMs) have recently demonstrated remarkable progress, yet hallucination remains a critical barrier, particularly in chart understanding, which requires sophisticated perceptual and cognitive abilities as well as rigorous factual accuracy. While prior work has investigated hallucinations and chart comprehension independently, their intersection remains largely unexplored. To address this gap, we present ChartHal, a benchmark that features a fine-grained taxonomy of hallucination scenarios in chart understanding, along with a human-validated dataset of 1,062 samples. Our evaluation shows that state-of-the-art LVLMs suffer from severe hallucinations on ChartHal, including proprietary models such as GPT-5 and o4-mini, which achieve only 34.46% and 22.79% accuracy, respectively. Further analysis reveals that questions involving information absent from or contradictory to charts are especially likely to trigger hallucinations, underscoring the urgent need for more robust mitigation strategies. Code and data are available at https://github.com/ymcui/ChartHal .
Problem

Research questions and friction points this paper is trying to address.

Evaluating hallucination issues in Large Vision-Language Models for chart understanding
Addressing the lack of fine-grained taxonomy for chart-related hallucination scenarios
Assessing LVLM performance on information absent from or contradictory to charts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained taxonomy for chart hallucination scenarios
Human-validated dataset with 1,062 chart samples
Evaluation framework revealing LVLM hallucinations on charts
🔎 Similar Papers
No similar papers found.
X
Xingqi Wang
Department of Computer Science and Technology, Tsinghua University
Yiming Cui
Yiming Cui
Research Scientist, iFLYTEK Research
Large Language ModelMachine Reading ComprehensionQuestion AnsweringNatural Language Processing
X
Xin Yao
State Key Laboratory of Cognitive Intelligence, iFLYTEK, Beijing, China.
Shijin Wang
Shijin Wang
Tongji University
Schedulingmaintenance
G
Guoping Hu
State Key Laboratory of Cognitive Intelligence, iFLYTEK, Beijing, China.
Xiaoyu Qin
Xiaoyu Qin
Tsinghua University
Artificial Intelligence