RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing vision-language models lack systematic evaluation in generating code for complex, multi-panel charts from real-world data, particularly in multi-turn interactive refinement scenarios. This work proposes RealChart2Code—a large-scale benchmark comprising over 2,800 real chart–code pairs—to conduct the first systematic assessment of 14 state-of-the-art models on both single-turn code generation and multi-turn iterative optimization tasks. The benchmark addresses a critical gap in real-data-driven, interactive chart reproduction evaluation. Experimental results reveal a significant performance drop among current models when handling complex multi-panel visualizations, with proprietary models consistently outperforming open-source counterparts. Moreover, even the most advanced models struggle to accurately reproduce real-world visualizations, highlighting substantial room for improvement in this domain.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce \textbf{\texttt{RealChart2Code}}, a new large-scale benchmark with over 2,800 instances grounded in authentic datasets and featuring tasks with clear analytical intent. Crucially, it is the first benchmark to systematically evaluate chart generation from large-scale raw data and assess iterative code refinement in a multi-turn conversational setting. Our comprehensive evaluation of 14 leading VLMs on \texttt{RealChart2Code} reveals significant performance degradation compared to simpler benchmarks, highlighting their struggles with complex plot structures and authentic data. Our analysis uncovers a substantial performance gap between proprietary and open-weight models and confirms that even state-of-the-art VLMs often fail to accurately replicate intricate, multi-panel charts. These findings provide valuable insights into the current limitations of VLMs and guide future research directions. We release the benchmark and code at \url{https://github.com/Speakn0w/RealChart2Code}.

Problem

Research questions and friction points this paper is trying to address.

chart-to-code generation

vision-language models

real-world data

multi-panel visualizations

code generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

chart-to-code generation

vision-language models

real-world data