CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing multimodal large language models (MLLMs) exhibit low efficiency and poor robustness when processing multimodal financial data—such as charts, graphs, and tables—in Chinese contexts. Method: We introduce CMF-Bench, the first Chinese multimodal financial evaluation benchmark, comprising over 9,000 image–text question-answer pairs covering bar charts, line charts, pie charts, structural diagrams, and tabular data. We propose a staged visual input assessment mechanism and a progressive, domain-specific evaluation framework that jointly attributes performance to visual parsing capability and financial concept comprehension. Contribution/Results: Comprehensive evaluation reveals that current MLLMs underperform significantly on financial multimodal tasks, primarily due to visual misinterpretation and financial concept misunderstanding. Our analysis empirically validates the necessity of domain-adaptive optimization and demonstrates substantial room for improvement through targeted finetuning and architectural enhancements.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have rapidly evolved with the growth of Large Language Models (LLMs) and are now applied in various fields. In finance, the integration of diverse modalities such as text, charts, and tables is crucial for accurate and efficient decision-making. Therefore, an effective evaluation system that incorporates these data types is essential for advancing financial application. In this paper, we introduce CFBenchmark-MM, a Chinese multimodal financial benchmark with over 9,000 image-question pairs featuring tables, histogram charts, line charts, pie charts, and structural diagrams. Additionally, we develop a staged evaluation system to assess MLLMs in handling multimodal information by providing different visual content step by step. Despite MLLMs having inherent financial knowledge, experimental results still show limited efficiency and robustness in handling multimodal financial context. Further analysis on incorrect responses reveals the misinterpretation of visual content and the misunderstanding of financial concepts are the primary issues. Our research validates the significant, yet underexploited, potential of MLLMs in financial analysis, highlighting the need for further development and domain-specific optimization to encourage the enhanced use in financial domain.

Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' performance in multimodal financial analysis

Addressing misinterpretation of visual content in financial contexts

Improving efficiency and robustness in financial decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal benchmark with 9000 image-question pairs

Staged evaluation system for multimodal information

Analysis reveals visual and financial concept issues

🔎 Similar Papers

No similar papers found.

Authors to Follow