VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning

📅 2024-09-03
🏛️ AAAI Conference on Artificial Intelligence
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chart Question Answering (CQA) faces two key challenges: poor image interpretability and limited capability in complex logical and numerical reasoning. To address these, we propose VPAgent—a Vision-Perception Alignment agent—that explicitly models spatial and semantic relationships among chart elements for the first time. We further introduce a large language model–driven, programmatic problem-solving framework integrating structured chart parsing, symbolic execution, and verifiable computation. Unlike end-to-end black-box approaches, VPAgent decouples visual perception from precise numerical and logical reasoning, enabling synergistic, interpretable, and auditable inference. Evaluated on ChartQA and PlotQA benchmarks, our method achieves new state-of-the-art performance, improving accuracy by up to 8.2%—particularly excelling in multi-hop reasoning and arithmetic tasks. This work establishes a novel paradigm for explainable and verifiable chart understanding.

Technology Category

Application Category

📝 Abstract
Charts are widely used for data visualization across various fields, including education, research, and business. Chart Question Answering (CQA) is an emerging task focused on the automatic interpretation and reasoning of data presented in charts. However, chart images are inherently difficult to interpret, and chart-related questions often involve complex logical and numerical reasoning, which hinders the performance of existing models. This paper introduces VProChart, a novel framework designed to address these challenges in CQA by integrating a lightweight Visual Perception Alignment Agent (VPAgent) and a Programmatic Solution Reasoning approach. VPAgent aligns and models chart elements based on principles of human visual perception, enhancing the understanding of chart context. The Programmatic Solution Reasoning approach leverages large language models (LLMs) to transform natural language reasoning questions into structured solution programs, facilitating precise numerical and logical reasoning. Extensive experiments on benchmark datasets such as ChartQA and PlotQA demonstrate that VProChart significantly outperforms existing methods, highlighting its capability in understanding and reasoning with charts.
Problem

Research questions and friction points this paper is trying to address.

Interpreting complex chart images for accurate data understanding
Enhancing logical and numerical reasoning in chart question answering
Aligning visual perception with programmatic reasoning for chart analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Visual Perception Alignment Agent
Programmatic Solution Reasoning with LLMs
Aligns chart elements via human perception
🔎 Similar Papers
No similar papers found.
M
Muye Huang
Xi’an Jiaotong University
Lingling Zhang
Lingling Zhang
Assistant Professor, Xi'an Jiaotong University
Computer visionFew-shot learningZero-shot learning
L
Lai Han
Xi’an Jiaotong University
W
Wenjun Wu
Xi’an Jiaotong University
X
Xinyu Zhang
Xi’an Jiaotong University
J
Jun Liu
Xi’an Jiaotong University