CharTide: Data-Centric Chart-to-Code Generation via Tri-Perspective Tuning and Inquiry-Driven Evolution

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

160K/year
🤖 AI Summary
Existing chart-to-code generation methods struggle to disentangle visual perception from program logic due to data homogenization, limiting the effective use of multimodal supervision signals. This work proposes CharTide, a framework that decouples visual perception, code logic, and modality fusion through a tri-perspective fine-tuning strategy, reframing alignment as a verifiable data validation problem grounded in information invariance. We introduce a novel 2M-sample tri-perspective training dataset and an inquiry-based reinforcement learning approach driven by atomic question answering, using answer accuracy as an objective reward signal—eliminating the need for rule matching or subjective evaluation. CharTide-7B/8B substantially outperforms open-source baselines on ChartMimic, Plot2Code, and ChartX, surpassing GPT-4o and approaching GPT-5-level performance.

Technology Category

Application Category

📝 Abstract
Chart-to-code generation demands strict visual precision and syntactic correctness from Vision-Language Models (VLMs). However, existing approaches are fundamentally constrained by data-centric limitations: despite the availability of growing chart-to-code datasets, simply scaling homogeneous chart-code pairs conflates visual perception with program logic, preventing models from fully leveraging the richness of multimodal supervision. We present CharTide, a novel data-centric framework that systematically redesigns both training and alignment data for chart-to-code generation. First, we construct a 2M-sample dataset via a Tri-Perspective Tuning strategy, explicitly decoupling training into visual perception, pure-text code logic, and modality fusion streams, enabling a 7B model to surpass specialized baselines using only supervised data. Second, we reformulate alignment as a data verification problem rather than a heuristic scoring task. To this end, we introduce an Inquiry-Driven RL framework grounded in the principle of information invariance: a downstream model should yield consistent answers to identical visual queries across both original and generated charts. Moving beyond rigid rule matching or VLM scoring, we employ a frozen Inspector to objectively verify generated charts through atomic QA tasks, providing verifiable reward signals based on answer accuracy. Experiments on ChartMimic, Plot2Code, and ChartX show that CharTide-7B/8B significantly outperforms open-source baselines, surpasses GPT-4o, and is competitive with GPT-5.
Problem

Research questions and friction points this paper is trying to address.

chart-to-code generation
data-centric limitation
visual perception
program logic
multimodal supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-Perspective Tuning
Inquiry-Driven RL
data-centric chart-to-code generation
information invariance
modality fusion