🤖 AI Summary
Temporal chart summarization suffers from frequent hallucinations and semantically impoverished outputs, undermining decision-support capability. Method: We propose a multi-agent collaborative framework featuring (1) an external data analysis module (for data parsing and statistical computation) to enhance factual grounding; (2) an insight-driven self-consistency verification mechanism to suppress hallucinations during generation; and (3) the first fine-grained hallucination-annotated benchmark dataset for temporal chart summarization. Our approach integrates multi-agent iterative collaboration with a hallucination-aware evaluation protocol. Contributions/Results: Experiments on our benchmark show a minimum hallucination rate of 8.2%, with significant improvements in summary fidelity and semantic richness over state-of-the-art methods. The code and dataset are publicly released to advance trustworthy vision-language understanding in visualization.
📝 Abstract
Effective chart summary can significantly reduce the time and effort decision makers spend interpreting charts, enabling precise and efficient communication of data insights. Previous studies have faced challenges in generating accurate and semantically rich summaries of time-series data charts. In this paper, we identify summary elements and common hallucination types in the generation of time-series chart summaries, which serve as our guidelines for automatic generation. We introduce ChartInsighter, which automatically generates chart summaries of time-series data, effectively reducing hallucinations in chart summary generation. Specifically, we assign multiple agents to generate the initial chart summary and collaborate iteratively, during which they invoke external data analysis modules to extract insights and compile them into a coherent summary. Additionally, we implement a self-consistency test method to validate and correct our summary. We create a high-quality benchmark of charts and summaries, with hallucination types annotated on a sentence-by-sentence basis, facilitating the evaluation of the effectiveness of reducing hallucinations. Our evaluations using our benchmark show that our method surpasses state-of-the-art models, and that our summary hallucination rate is the lowest, which effectively reduces various hallucinations and improves summary quality. The benchmark is available at https://github.com/wangfen01/ChartInsighter.