FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing financial numerical reasoning benchmarks lack multimodality, comprehensiveness, and sufficient challenge, failing to assess MLLMs’ multi-step precise reasoning over complex financial visuals (e.g., ownership structure diagrams, bar charts, tables) jointly with textual context. Method: We introduce FinNum—the first bilingual (Chinese–English) multimodal financial numerical reasoning benchmark—covering 14 financial subdomains, constructed via human annotation and multi-source synthesis, comprising 4.3K questions and 8.7K images. Contribution/Results: FinNum is the first large-scale multimodal reasoning dataset systematically built from authentic Chinese financial research reports; it emphasizes finance-knowledge-driven multi-step numerical reasoning and significantly raises evaluation difficulty. Experiments show top MLLMs achieve only 53.0% accuracy on its challenging subset, exposing a critical bottleneck in domain-specific multimodal reasoning.

Technology Category

Application Category

📝 Abstract

We present FinMMR, a novel bilingual multimodal benchmark tailored to evaluate the reasoning capabilities of multimodal large language models (MLLMs) in financial numerical reasoning tasks. Compared to existing benchmarks, our work introduces three significant advancements. (1) Multimodality: We meticulously transform existing financial reasoning benchmarks, and construct novel questions from the latest Chinese financial research reports. FinMMR comprises 4.3K questions and 8.7K images spanning 14 categories, including tables, bar charts, and ownership structure charts. (2) Comprehensiveness: FinMMR encompasses 14 financial subdomains, including corporate finance, banking, and industry analysis, significantly exceeding existing benchmarks in financial domain knowledge breadth. (3) Challenge: Models are required to perform multi-step precise numerical reasoning by integrating financial knowledge with the understanding of complex financial images and text. The best-performing MLLM achieves only 53.0% accuracy on Hard problems. We believe that FinMMR will drive advancements in enhancing the reasoning capabilities of MLLMs in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Evaluate MLLMs' reasoning in financial numerical tasks

Expand financial domain coverage with 14 subdomains

Enhance multimodal reasoning with complex financial images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal benchmark with 4.3K questions and 8.7K images

Covers 14 financial subdomains for comprehensive evaluation

Requires multi-step numerical reasoning with financial knowledge

🔎 Similar Papers

Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering