Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

๐Ÿ“… 2026-02-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing vision-language models often suffer from structural distortions and semantic hallucinations in chart-to-code generation due to superficial imitation. To address this, this work proposes Chart Specificationโ€”a structured intermediate representation that shifts the learning objective from textual mimicry to structure-aware semantic alignment. The approach constructs a balanced training set via denoising and introduces Spec-Align, a verifiable, fine-grained reward mechanism that enables reinforcement learning with explicit semantic guidance. By integrating structured representation and semantic alignment rewards for the first time, the method achieves remarkable data efficiency and generation quality with only 3Kโ€“4K training samples, establishing new state-of-the-art results across three public benchmarks and outperforming baselines by up to 61.7% on complex tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation rather than faithful modeling of underlying chart structure, which often leads to hallucinated or semantically inconsistent outputs. We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision. Chart Specification filters syntactic noise to construct a structurally balanced training set and supports a Spec-Align Reward that provides fine-grained, verifiable feedback on structural correctness, enabling reinforcement learning to enforce consistent plotting logic. Experiments on three public benchmarks show that our method consistently outperforms prior approaches. With only 3K training samples, we achieve strong data efficiency, surpassing leading baselines by up to 61.7% on complex benchmarks, and scaling to 4K samples establishes new state-of-the-art results across all evaluated metrics. Overall, our results demonstrate that precise structural supervision offers an efficient pathway to high-fidelity chart-to-code generation. Code and dataset are available at: https://github.com/Mighten/chart-specification-paper
Problem

Research questions and friction points this paper is trying to address.

chart-to-code generation
structural fidelity
vision-language models
semantic consistency
hallucination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chart Specification
Structural Representation
Vision-Language Models
Reinforcement Learning
Chart-to-Code Generation
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Minggui He
M
Mingchen Dai
J
Jian Zhang
Y
Yilun Liu
Shimin Tao
Shimin Tao
2012 Lab, Huawei co. LTD
Machine Translation AIOps Log Analysis
P
Pufan Zeng
Osamu Yoshie
Osamu Yoshie
waseda university
Y
Yuya Ieiri