🤖 AI Summary
Existing vision-language models exhibit limited generalization on unseen charts due to insufficient abstract, symbolic, and quantitative reasoning capabilities. To address this, this work proposes Chart-RL, the first framework to integrate mathematically verifiable rewards into reinforcement learning for chart question answering, thereby enhancing both reasoning and generalization. The approach demonstrates that task difficulty is more critical than data volume: training on merely 10 complex samples outperforms using 6,000 simple ones. Chart-RL achieves performance gains of 16.7% and 11.5% on MultChartQA and ChartInsights, respectively, and surpasses baselines on 18 out of 25 perturbed chart types, showcasing strong cross-domain transferability.
📝 Abstract
Accurate chart comprehension represents a critical challenge in advancing multimodal learning systems, as extensive information is compressed into structured visual representations. However, existing vision-language models (VLMs) frequently struggle to generalize on unseen charts because it requires abstract, symbolic, and quantitative reasoning over structured visual representations. In this work, we introduce Chart-RL, an effective reinforcement learning (RL) method that employs mathematically verifiable rewards to enhance chart question answering in VLMs. Our experiments demonstrate that Chart-RL consistently outperforms supervised fine-tuning (SFT) across different chart understanding benchmarks, achieving relative improvements of 16.7% on MutlChartQA, and 11.5% on ChartInsights. We conduct robustness analysis, where Chart-RL achieves enhanced performance in 18 of 25 perturbed chart categories, demonstrating strong consistency and reasoning capability across visual variations. Furthermore, we demonstrate that task difficulty and inherent complexity are more critical than data quantity in RL training. For instance, Chart-RL trained on merely 10 complex chart-query examples significantly outperforms models trained on over 6,000 simple examples. Additionally, training on challenging reasoning tasks not only improves in-domain generalization relative to simpler tasks, but also facilitate strong transfer to out-of-domain visual mathematical problems.