๐ค AI Summary
Existing vision-language models often suffer from structural distortions and semantic hallucinations in chart-to-code generation due to superficial imitation. To address this, this work proposes Chart Specificationโa structured intermediate representation that shifts the learning objective from textual mimicry to structure-aware semantic alignment. The approach constructs a balanced training set via denoising and introduces Spec-Align, a verifiable, fine-grained reward mechanism that enables reinforcement learning with explicit semantic guidance. By integrating structured representation and semantic alignment rewards for the first time, the method achieves remarkable data efficiency and generation quality with only 3Kโ4K training samples, establishing new state-of-the-art results across three public benchmarks and outperforming baselines by up to 61.7% on complex tasks.
๐ Abstract
Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation rather than faithful modeling of underlying chart structure, which often leads to hallucinated or semantically inconsistent outputs. We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision. Chart Specification filters syntactic noise to construct a structurally balanced training set and supports a Spec-Align Reward that provides fine-grained, verifiable feedback on structural correctness, enabling reinforcement learning to enforce consistent plotting logic. Experiments on three public benchmarks show that our method consistently outperforms prior approaches. With only 3K training samples, we achieve strong data efficiency, surpassing leading baselines by up to 61.7% on complex benchmarks, and scaling to 4K samples establishes new state-of-the-art results across all evaluated metrics. Overall, our results demonstrate that precise structural supervision offers an efficient pathway to high-fidelity chart-to-code generation. Code and dataset are available at: https://github.com/Mighten/chart-specification-paper