ChartMoE: Mixture of Expert Connector for Advanced Chart Understanding

📅 2024-09-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient cross-modal semantic alignment among images, tables, JSON, and code in multimodal large language models (MLLMs) for chart understanding, this paper proposes a dynamic MoE-based connector—replacing conventional linear projections—to enable multi-granularity semantic alignment. We introduce the first MoE-designed connector tailored for chart understanding, construct ChartMoE-Align, a high-quality quadruple dataset comprising over 900K samples, and propose four expert initialization strategies jointly optimized via high-fidelity knowledge distillation. Evaluated on ChartQA, our method achieves 84.64% accuracy, significantly surpassing the prior state-of-the-art (80.48%). It demonstrates substantial improvements in complex structural parsing, numerical logical reasoning, and chart-to-code generation—highlighting enhanced cross-modal comprehension and generation capabilities.

Technology Category

Application Category

📝 Abstract
Automatic chart understanding is crucial for content comprehension and document parsing. Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in chart understanding through domain-specific alignment and fine-tuning. However, the application of alignment training within the chart domain is still underexplored. To address this, we propose ChartMoE, which employs the mixture of expert (MoE) architecture to replace the traditional linear projector to bridge the modality gap. Specifically, we train multiple linear connectors through distinct alignment tasks, which are utilized as the foundational initialization parameters for different experts. Additionally, we introduce ChartMoE-Align, a dataset with over 900K chart-table-JSON-code quadruples to conduct three alignment tasks (chart-table/JSON/code). Combined with the vanilla connector, we initialize different experts in four distinct ways and adopt high-quality knowledge learning to further refine the MoE connector and LLM parameters. Extensive experiments demonstrate the effectiveness of the MoE connector and our initialization strategy, e.g., ChartMoE improves the accuracy of the previous state-of-the-art from 80.48% to 84.64% on the ChartQA benchmark.
Problem

Research questions and friction points this paper is trying to address.

Bridging modality gap in chart understanding
Enhancing alignment training in chart domain
Improving accuracy in chart comprehension tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Expert architecture
ChartMoE-Align dataset
High-quality knowledge learning
🔎 Similar Papers
No similar papers found.