Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing chart understanding methods suffer from two key limitations: (1) reliance on external tools, rendering systems brittle; and (2) fine-tuning specialized models with fixed, single-path reasoning—e.g., text-based chain-of-thought—whose intermediate steps are opaque and unverifiable, hindering factual accuracy improvement via reward signals. To address these, we propose **Visual Programmability**, an adaptive multi-path reasoning framework for vision-language models. It employs a learnable path-selection mechanism to dynamically choose between code-as-thought execution and direct visual analysis. Further, we introduce dual-reward reinforcement learning to jointly optimize both path selection and numerical/semantic accuracy. Our approach significantly enhances robustness in complex chart understanding, reduces numerical hallucination across multiple benchmarks, and improves interpretability and verifiability of the reasoning process.

Technology Category

Application Category

📝 Abstract

Chart understanding presents a critical test to the reasoning capabilities of Vision-Language Models (VLMs). Prior approaches face critical limitations: some rely on external tools, making them brittle and constrained by a predefined toolkit, while others fine-tune specialist models that often adopt a single reasoning strategy, such as text-based chain-of-thought (CoT). The intermediate steps of text-based reasoning are difficult to verify, which complicates the use of reinforcement-learning signals that reward factual accuracy. To address this, we propose a Code-as-Thought (CaT) approach to represent the visual information of a chart in a verifiable, symbolic format. Our key insight is that this strategy must be adaptive: a fixed, code-only implementation consistently fails on complex charts where symbolic representation is unsuitable. This finding leads us to introduce Visual Programmability: a learnable property that determines if a chart-question pair is better solved with code or direct visual analysis. We implement this concept in an adaptive framework where a VLM learns to choose between the CaT pathway and a direct visual reasoning pathway. The selection policy of the model is trained with reinforcement learning using a novel dual-reward system. This system combines a data-accuracy reward to ground the model in facts and prevent numerical hallucination, with a decision reward that teaches the model when to use each strategy, preventing it from defaulting to a single reasoning mode. Experiments demonstrate strong and robust performance across diverse chart-understanding benchmarks. Our work shows that VLMs can be taught not only to reason but also how to reason, dynamically selecting the optimal reasoning pathway for each task.

Problem

Research questions and friction points this paper is trying to address.

Adaptive reasoning strategy selection for chart understanding

Verifiable symbolic representation of visual chart information

Preventing numerical hallucination and single-mode reasoning defaults

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Code-as-Thought framework

Dual-reward reinforcement learning system

Dynamic visual programmability selection

🔎 Similar Papers

VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning