🤖 AI Summary
To address the limited explanation-generation capability in chart visual understanding—which impedes agent trustworthiness and user comprehension—this paper introduces ChartQA-X, the first large-scale, jointly annotated chart question-answering and natural-language explanation dataset (28,299 samples) covering diverse chart types. We propose a novel question-explanation co-modeling paradigm and design a high-quality explanation filtering mechanism leveraging multi-model voting and multidimensional automated evaluation (faithfulness, informativeness, coherence, and perplexity). Our approach integrates multi-model prompting, vision-language model fine-tuning, and cross-dataset generalization validation. Experimental results demonstrate state-of-the-art performance in explanation quality across all metrics; substantial gains in QA accuracy on unseen datasets; and empirical improvements in user comprehension and system trustworthiness.
📝 Abstract
The ability to interpret and explain complex information from visual data in charts is crucial for data-driven decision-making. In this work, we address the challenge of providing explanations alongside answering questions about chart images. We present ChartQA-X, a comprehensive dataset comprising various chart types with 28,299 contextually relevant questions, answers, and detailed explanations. These explanations are generated by prompting six different models and selecting the best responses based on metrics such as faithfulness, informativeness, coherence, and perplexity. Our experiments show that models fine-tuned on our dataset for explanation generation achieve superior performance across various metrics and demonstrate improved accuracy in question-answering tasks on new datasets. By integrating answers with explanatory narratives, our approach enhances the ability of intelligent agents to convey complex information effectively, improve user understanding, and foster trust in the generated responses.