ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

📅 2024-02-19
🏛️ arXiv.org
📈 Citations: 30
Influential: 5
📄 PDF
🤖 AI Summary
Existing multimodal large models (MLLMs) suffer from poor interpretability in complex chart understanding and reasoning, and lack comprehensive, fine-grained evaluation benchmarks. Method: We introduce ChartX—the first benchmark covering 18 chart types, 7 reasoning tasks, and 22 academic domains—and propose ChartVLM, a dedicated chart foundation model. ChartVLM innovatively integrates chart-structure-aware visual encoding, multimodal collaborative representation learning, and task-adaptive instruction tuning to enhance interpretability in pattern recognition. Contribution/Results: On ChartX, ChartVLM significantly outperforms mainstream MLLMs and matches the performance of GPT-4V. Both the open-source code and ChartX dataset have been widely adopted by the research community. This work bridges two critical gaps in chart understanding: the absence of a systematic, domain-diverse evaluation framework and the lack of specialized, interpretable modeling architectures.

Technology Category

Application Category

📝 Abstract
Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged continuously. However, their capacity to query information depicted in visual charts and engage in reasoning based on the queried contents remains under-explored. In this paper, to comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in the chart domain, we construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data. Besides, we develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns, such as reasoning tasks in the field of charts or geometric images. We evaluate the chart-related ability of mainstream MLLMs and our ChartVLM on the proposed ChartX evaluation set. Extensive experiments demonstrate that ChartVLM surpasses both versatile and chart-related large models, achieving results comparable to GPT-4V. We believe that our study can pave the way for further exploration in creating a more comprehensive chart evaluation set and developing more interpretable multi-modal models. Both ChartX and ChartVLM are available at: https://github.com/Alpha-Innovator/ChartVLM
Problem

Research questions and friction points this paper is trying to address.

Language Model
Complex Chart Analysis
Understanding Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

ChartX
ChartVLM
GPT-4V Performance