From Charts to Code: A Hierarchical Benchmark for Multimodal Models

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large multimodal models (LMMs) lack realistic, hierarchical, and user-centric evaluation frameworks for chart understanding and code generation (chart2code). To address this, we introduce Chart2Code—the first difficulty-graded, application-oriented multimodal benchmark for chart2code. It comprises three progressively challenging tiers: chart reconstruction, interactive editing, and long-table-to-chart generation. Evaluation integrates dual dimensions—code correctness and visual fidelity—validated via human annotation combined with automated verification across 22 chart types and 2,023 high-quality samples. We comprehensively evaluate 25 state-of-the-art (SOTA) multimodal models. Results reveal severe limitations: even the strongest model (e.g., GPT-5) achieves only 0.57 code accuracy and 0.22 visual quality score on editing tasks. These findings underscore the task’s inherent difficulty and expose critical model deficiencies. Chart2Code establishes a scalable, diagnostic evaluation paradigm to guide future research in multimodal chart understanding and generation.

Technology Category

Application Category

📝 Abstract
We introduce Chart2Code, a new benchmark for evaluating the chart understanding and code generation capabilities of large multimodal models (LMMs). Chart2Code is explicitly designed from a user-driven perspective, capturing diverse real-world scenarios and progressively increasing task difficulty. It consists of three levels: Level 1 (Chart Reproduction) reproduces charts from a reference figure and user query; Level 2 (Chart Editing) involves complex modifications such as changing chart types or adding elements; and Level 3 (Long-Table to Chart Generation) requires models to transform long, information-dense tables into faithful charts following user instructions. To our knowledge, this is the first hierarchical benchmark that reflects practical chart2code usage while systematically scaling task complexity. In total, Chart2Code contains 2,023 tasks across 22 chart types, paired with multi-level evaluation metrics that assess both code correctness and the visual fidelity of rendered charts. We benchmark 25 state-of-the-art (SoTA) LMMs, including both proprietary and the latest open-source models such as GPT-5, Qwen2.5-VL, InternVL3/3.5, MiMo-VL, and Seed-1.6-VL. Experimental results demonstrate that even the SoTA model GPT-5 averages only 0.57 on code-based evaluation and 0.22 on chart-quality assessment across the editing tasks, underscoring the difficulty of Chart2Code. We anticipate this benchmark will drive advances in multimodal reasoning and foster the development of more robust and general-purpose LMMs. Our code and data are available on Chart2Code.
Problem

Research questions and friction points this paper is trying to address.

Evaluating chart understanding and code generation in multimodal models
Systematically scaling task complexity from reproduction to transformation
Assessing visual fidelity and code correctness across diverse chart types
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical benchmark for multimodal chart understanding
Three-level task complexity from reproduction to generation
Multi-level evaluation metrics for code and visual fidelity
🔎 Similar Papers
No similar papers found.
J
Jiahao Tang
CSU-JPG, Central South University
Henry Hengyuan Zhao
Henry Hengyuan Zhao
Ph.D. student at National University of Singapore
Multimodal ReasoningAI AgentHuman-AI Interaction
L
Lijian Wu
CSU-JPG, Central South University
Y
Yifei Tao
Nanyang Technological University
D
Dongxing Mao
CSU-JPG, Central South University
Y
Yang Wan
CSU-JPG, Central South University
J
Jingru Tan
CSU-JPG, Central South University
Min Zeng
Min Zeng
School of Computer Science and Engineering, Central South University
BioinformaticsMachine LearningDeep Learning
M
Min Li
CSU-JPG, Central South University
A
Alex Jinpeng Wang
CSU-JPG, Central South University