🤖 AI Summary
Chart Question Answering (CQA) faces two key challenges: poor image interpretability and limited capability in complex logical and numerical reasoning. To address these, we propose VPAgent—a Vision-Perception Alignment agent—that explicitly models spatial and semantic relationships among chart elements for the first time. We further introduce a large language model–driven, programmatic problem-solving framework integrating structured chart parsing, symbolic execution, and verifiable computation. Unlike end-to-end black-box approaches, VPAgent decouples visual perception from precise numerical and logical reasoning, enabling synergistic, interpretable, and auditable inference. Evaluated on ChartQA and PlotQA benchmarks, our method achieves new state-of-the-art performance, improving accuracy by up to 8.2%—particularly excelling in multi-hop reasoning and arithmetic tasks. This work establishes a novel paradigm for explainable and verifiable chart understanding.
📝 Abstract
Charts are widely used for data visualization across various fields, including education, research, and business. Chart Question Answering (CQA) is an emerging task focused on the automatic interpretation and reasoning of data presented in charts. However, chart images are inherently difficult to interpret, and chart-related questions often involve complex logical and numerical reasoning, which hinders the performance of existing models. This paper introduces VProChart, a novel framework designed to address these challenges in CQA by integrating a lightweight Visual Perception Alignment Agent (VPAgent) and a Programmatic Solution Reasoning approach. VPAgent aligns and models chart elements based on principles of human visual perception, enhancing the understanding of chart context. The Programmatic Solution Reasoning approach leverages large language models (LLMs) to transform natural language reasoning questions into structured solution programs, facilitating precise numerical and logical reasoning. Extensive experiments on benchmark datasets such as ChartQA and PlotQA demonstrate that VProChart significantly outperforms existing methods, highlighting its capability in understanding and reasoning with charts.