🤖 AI Summary
Current large language models (LLMs) face two key bottlenecks in generating complex, real-world data visualizations: (1) prevalent benchmarks cover only a narrow subset of chart types—lacking 3D plots, volume rendering, mesh visualizations, etc.; and (2) standard supervised fine-tuning fails to capture cross-modal alignments among natural language, code, tabular data, and rendered images. To address these, we introduce Text2Chart31—the first comprehensive, Matplotlib-aligned Text-to-Chart dataset spanning 31 diverse chart categories—and propose a hierarchical generation pipeline coupled with a human-annotation-free reinforcement learning–based instruction tuning framework. Our approach enables end-to-end alignment from textual specifications to executable Python code to final visual outputs. Key innovations include explicit multimodal alignment modeling, unified representation for heterogeneous chart types, and lightweight model adaptation techniques. Experiments demonstrate substantial performance gains for small-scale models, surpassing larger open-source LLMs across multiple visualization quality metrics and matching state-of-the-art proprietary models.
📝 Abstract
Large language models (LLMs) have demonstrated strong capabilities across various language tasks, notably through instruction-tuning methods. However, LLMs face challenges in visualizing complex, real-world data through charts and plots. Firstly, existing datasets rarely cover a full range of chart types, such as 3D, volumetric, and gridded charts. Secondly, supervised fine-tuning methods do not fully leverage the intricate relationships within rich datasets, including text, code, and figures. To address these challenges, we propose a hierarchical pipeline and a new dataset for chart generation. Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library, with 11.1K tuples of descriptions, code, data tables, and plots. Moreover, we introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback. Our experiments show that this approach significantly enhances the model performance, enabling smaller models to outperform larger open-source models and be comparable to state-of-the-art proprietary models in data visualization tasks.