ChartComplete: A Taxonomy-based Inclusive Chart Dataset

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing chart understanding benchmark datasets cover a limited range of chart types, making it difficult to comprehensively evaluate the generalization capabilities of multimodal large language models. To address this gap, this work presents ChartComplete, the first high-quality, large-scale dataset systematically constructed based on established visualization taxonomies, encompassing 30 distinct chart types. ChartComplete strictly adheres to chart taxonomy principles to guide data collection and organization, deliberately excludes any learning signals, and prioritizes structural fidelity and diversity. By substantially expanding the chart-type coverage beyond current benchmarks, ChartComplete provides an open, comprehensive, and standardized resource for training and evaluating multimodal models across a broad spectrum of visualization forms.

Technology Category

Application Category

📝 Abstract
With advancements in deep learning (DL) and computer vision techniques, the field of chart understanding is evolving rapidly. In particular, multimodal large language models (MLLMs) are proving to be efficient and accurate in understanding charts. To accurately measure the performance of MLLMs, the research community has developed multiple datasets to serve as benchmarks. By examining these datasets, we found that they are all limited to a small set of chart types. To bridge this gap, we propose the ChartComplete dataset. The dataset is based on a chart taxonomy borrowed from the visualization community, and it covers thirty different chart types. The dataset is a collection of classified chart images and does not include a learning signal. We present the ChartComplete dataset as is to the community to build upon it.
Problem

Research questions and friction points this paper is trying to address.

chart understanding
multimodal large language models
benchmark dataset
chart taxonomy
inclusive chart dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

ChartComplete
chart taxonomy
inclusive dataset
multimodal large language models
chart understanding
🔎 Similar Papers
No similar papers found.
Ahmad Mustapha
Ahmad Mustapha
Researcher
deep learning
C
Charbel Toumieh
American University of Beirut, Lebanon
M
Mariette Awad
American University of Beirut, Lebanon