CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key limitations of multimodal large language models in chart understanding—namely, the scarcity of high-quality training data, challenges in fine-grained visual grounding, and insufficient numerical reasoning accuracy. To overcome these issues, the authors propose DuoChart, a novel framework that constructs a scalable dual-source training set by combining synthetic and real-world data. DuoChart uniquely integrates image cropping and code execution tools deeply into the multimodal reasoning pipeline and employs agent-based reinforcement learning to enable content-grounded tool invocation. Evaluated across six chart understanding benchmarks, the approach substantially outperforms comparable models, with CharTool-7B achieving absolute gains of 8.0% on CharXiv (reasoning) and 9.78% on ChartQAPro, while also demonstrating strong out-of-domain generalization in visual-mathematical reasoning tasks.
📝 Abstract
Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large language models (MLLMs) due to the lack of high-quality training data, as well as the need for fine-grained visual grounding and precise numerical computation. To address these challenges, we first propose DuoChart, a scalable dual-source data pipeline that combines synthesized charts with real-world charts to construct diverse, high-quality chart training data. We then introduce CharTool, which equips MLLMs with external tools, including image cropping for localized visual perception and code-based computation for accurate numerical reasoning. Through agentic reinforcement learning on DuoChart, CharTool learns tool-integrated reasoning grounded in chart content. Extensive experiments on six chart benchmarks show that our method consistently improves over strong MLLM baselines across model scales. Notably, CharTool-7B outperforms the base model by **+8.0%** on CharXiv (Reasoning) and **+9.78%** on ChartQAPro, while achieving competitive performance with substantially larger or proprietary models. Moreover, CharTool demonstrates positive generalization to out-of-domain visual math reasoning benchmarks.
Problem

Research questions and friction points this paper is trying to address.

chart understanding
multimodal large language models
visual reasoning
numerical computation
visual grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

tool-integrated reasoning
multimodal large language models
chart understanding
DuoChart
visual grounding
🔎 Similar Papers
No similar papers found.