🤖 AI Summary
Infographics exhibit high visual-structural complexity, yet current large vision-language models (LVLMs) struggle with their understanding and generation due to reliance on generic chart training data. To address this, we propose ChartGalaxy—the first million-scale, high-quality infographic dataset covering 75 chart types, 330 variants, and 68 layouts. We systematically model infographic visual-structural complexity for the first time, integrating real-world design principles with procedural synthesis to construct a benchmark balancing authenticity and controllability. Leveraging ChartGalaxy, we perform multi-stage fine-tuning and establish the first infographic code-generation benchmark, enabling high-fidelity, editable, example-driven chart generation. Experiments demonstrate significant improvements across infographic understanding, code generation, and in-context generation tasks—effectively bridging a critical gap in LVLM capabilities for rich-chart comprehension and synthesis.
📝 Abstract
Infographic charts are a powerful medium for communicating abstract data by combining visual elements (e.g., charts, images) with textual information. However, their visual and structural richness poses challenges for large vision-language models (LVLMs), which are typically trained on plain charts. To bridge this gap, we introduce ChartGalaxy, a million-scale dataset designed to advance the understanding and generation of infographic charts. The dataset is constructed through an inductive process that identifies 75 chart types, 330 chart variations, and 68 layout templates from real infographic charts and uses them to create synthetic ones programmatically. We showcase the utility of this dataset through: 1) improving infographic chart understanding via fine-tuning, 2) benchmarking code generation for infographic charts, and 3) enabling example-based infographic chart generation. By capturing the visual and structural complexity of real design, ChartGalaxy provides a useful resource for enhancing multimodal reasoning and generation in LVLMs.