🤖 AI Summary
Multimodal large language models (MLLMs) face an interpretability bottleneck in chart analysis: their “black-box” decisions lack traceability to specific visual regions, hindering trustworthy real-world deployment. To address this, we propose a reasoning-guided attribution framework—the first systematic approach to enhance MLLMs’ capability to localize visual evidence during mathematical chart reasoning. Methodologically, we construct the first large-scale, fine-grained chart reasoning–attribution alignment dataset, leveraging semi-automatic annotation that integrates chain-of-thought reasoning generation with attention-based region alignment; crucially, our framework explicitly binds each reasoning step to its corresponding chart region. Experiments demonstrate a 15% improvement in attribution accuracy and a BERTScore of 0.90 for answer quality, significantly enhancing decision transparency and result reliability.
📝 Abstract
Data visualizations like charts are fundamental tools for quantitative analysis and decision-making across fields, requiring accurate interpretation and mathematical reasoning. The emergence of Multimodal Large Language Models (MLLMs) offers promising capabilities for automated visual data analysis, such as processing charts, answering questions, and generating summaries. However, they provide no visibility into which parts of the visual data informed their conclusions; this black-box nature poses significant challenges to real-world trust and adoption. In this paper, we take the first major step towards evaluating and enhancing the capabilities of MLLMs to attribute their reasoning process by highlighting the specific regions in charts and graphs that justify model answers. To this end, we contribute RADAR, a semi-automatic approach to obtain a benchmark dataset comprising 17,819 diverse samples with charts, questions, reasoning steps, and attribution annotations. We also introduce a method that provides attribution for chart-based mathematical reasoning. Experimental results demonstrate that our reasoning-guided approach improves attribution accuracy by 15% compared to baseline methods, and enhanced attribution capabilities translate to stronger answer generation, achieving an average BERTScore of $sim$ 0.90, indicating high alignment with ground truth responses. This advancement represents a significant step toward more interpretable and trustworthy chart analysis systems, enabling users to verify and understand model decisions through reasoning and attribution.