CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal large language models (MLLMs) exhibit low efficiency and poor robustness when processing multimodal financial data—such as charts, graphs, and tables—in Chinese contexts. Method: We introduce CMF-Bench, the first Chinese multimodal financial evaluation benchmark, comprising over 9,000 image–text question-answer pairs covering bar charts, line charts, pie charts, structural diagrams, and tabular data. We propose a staged visual input assessment mechanism and a progressive, domain-specific evaluation framework that jointly attributes performance to visual parsing capability and financial concept comprehension. Contribution/Results: Comprehensive evaluation reveals that current MLLMs underperform significantly on financial multimodal tasks, primarily due to visual misinterpretation and financial concept misunderstanding. Our analysis empirically validates the necessity of domain-adaptive optimization and demonstrates substantial room for improvement through targeted finetuning and architectural enhancements.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have rapidly evolved with the growth of Large Language Models (LLMs) and are now applied in various fields. In finance, the integration of diverse modalities such as text, charts, and tables is crucial for accurate and efficient decision-making. Therefore, an effective evaluation system that incorporates these data types is essential for advancing financial application. In this paper, we introduce CFBenchmark-MM, a Chinese multimodal financial benchmark with over 9,000 image-question pairs featuring tables, histogram charts, line charts, pie charts, and structural diagrams. Additionally, we develop a staged evaluation system to assess MLLMs in handling multimodal information by providing different visual content step by step. Despite MLLMs having inherent financial knowledge, experimental results still show limited efficiency and robustness in handling multimodal financial context. Further analysis on incorrect responses reveals the misinterpretation of visual content and the misunderstanding of financial concepts are the primary issues. Our research validates the significant, yet underexploited, potential of MLLMs in financial analysis, highlighting the need for further development and domain-specific optimization to encourage the enhanced use in financial domain.
Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' performance in multimodal financial analysis
Addressing misinterpretation of visual content in financial contexts
Improving efficiency and robustness in financial decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal benchmark with 9000 image-question pairs
Staged evaluation system for multimodal information
Analysis reveals visual and financial concept issues
🔎 Similar Papers
No similar papers found.
J
Jiangtong Li
School of Computer Science and Technology, Tongji University
Y
Yiyun Zhu
School of Computer Science and Technology, Tongji University
Dawei Cheng
Dawei Cheng
Tongji University
Data MiningGraph LearningDeep LearningBig Data in Finance
Z
Zhijun Ding
School of Computer Science and Technology, Tongji University
C
Changjun Jiang
School of Computer Science and Technology, Tongji University