MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing financial LLM evaluation benchmarks suffer from limitations including monolinguality, single-modality design, and oversimplified tasks, failing to reflect the cross-lingual and multimodal complexity of real-world financial scenarios. To address this, we propose PolyFiBench—the first difficulty-aware, multilingual (e.g., English, Spanish), and multimodal (text, visual, speech) benchmark for global finance. Our contributions include: (1) novel cross-lingual question answering (PolyFiQA-Easy/Expert) and OCR-augmented document understanding tasks; (2) a dynamic difficulty-aware item selection mechanism and a unified multimodal fusion evaluation framework; and (3) multilingual alignment modeling and OCR-text joint reasoning techniques. Extensive experiments across 22 state-of-the-art models reveal significant performance degradation—up to 38%—on cross-lingual multimodal financial tasks, highlighting critical gaps in current capabilities. PolyFiBench is publicly released to foster fair, reproducible, and inclusive advancement of financial AI.

Technology Category

Application Category

📝 Abstract
Recent advances in large language models (LLMs) have accelerated progress in financial NLP and applications, yet existing benchmarks remain limited to monolingual and unimodal settings, often over-relying on simple tasks and failing to reflect the complexity of real-world financial communication. We introduce MultiFinBen, the first multilingual and multimodal benchmark tailored to the global financial domain, evaluating LLMs across modalities (text, vision, audio) and linguistic settings (monolingual, bilingual, multilingual) on domain-specific tasks. We introduce two novel tasks, including PolyFiQA-Easy and PolyFiQA-Expert, the first multilingual financial benchmarks requiring models to perform complex reasoning over mixed-language inputs; and EnglishOCR and SpanishOCR, the first OCR-embedded financial QA tasks challenging models to extract and reason over information from visual-text financial documents. Moreover, we propose a dynamic, difficulty-aware selection mechanism and curate a compact, balanced benchmark rather than simple aggregation existing datasets. Extensive evaluation of 22 state-of-the-art models reveals that even the strongest models, despite their general multimodal and multilingual capabilities, struggle dramatically when faced with complex cross-lingual and multimodal tasks in financial domain. MultiFinBen is publicly released to foster transparent, reproducible, and inclusive progress in financial studies and applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluating financial LLMs across multilingual and multimodal settings
Assessing model performance on complex reasoning with mixed-language inputs
Challenging models with OCR-embedded financial document understanding tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual multimodal benchmark for finance
Dynamic difficulty-aware task selection
OCR-embedded financial QA tasks
🔎 Similar Papers
No similar papers found.
Xueqing Peng
Xueqing Peng
Yale University
Lingfei Qian
Lingfei Qian
Yale University
Y
Yan Wang
The FinAI
R
Ruoyu Xiang
New York University
Yueru He
Yueru He
Columbia University
FinanceLarge Language Models
Y
Yang Ren
The FinAI
Mingyang Jiang
Mingyang Jiang
Shanghai Jiao Tong University
roboticsintelligent vehiclemachine learning
J
Jeff Zhao
The FinAI
H
Huan He
The FinAI
Y
Yi Han
Georgia Institute of Technology
Y
Yun Feng
Asian Development Bank
Yuechen Jiang
Yuechen Jiang
University of Hawaii at Manoa
NLPMultimodalLLM AgentsFinTech
Yupeng Cao
Yupeng Cao
Stevens Institute of Technology
Natural Language ProcessingMultiModalTrustworthy AI
Haohang Li
Haohang Li
Stevens Institute of Technology
Mechanistic InterpretabilityLanguage ModelLLM AgentFinTech
Yangyang Yu
Yangyang Yu
Stevens Institute of Technology
Cognitive ScienceLanguage Agent DesignBayesian InferenceMulti-modal Learning
X
Xiaoyu Wang
New York University
Penglei Gao
Penglei Gao
Postdoctoral Fellow at Quantitative Health Sciences, Cleveland Clinic Lerner Research Institution
Machine learningTime series analysisMedical Image ProcessingClinic Data AnalysisStatistic
S
Shengyuan Lin
Carnegie Mellon University
K
Keyi Wang
Columbia University
S
Shanshan Yang
Stevens Institute of Technology
Y
Yilun Zhao
Yale University
Z
Zhiwei Liu
University of Manchester
P
Peng Lu
Harvard University
J
Jerry Huang
Harvard University
Suyuchen Wang
Suyuchen Wang
Université de Montréal / Mila
NLPLLMVLMDeep Learning
T
Triantafillos Papadopoulos
Athens University of Economics and Business and Archimedes
P
Polydoros Giannouris
University of Manchester
E
E. Soufleri
Archimedes/Athena RC Athens
N
Nuo Chen
National University of Singapore
Guojun Xiong
Guojun Xiong
Harvard University, Department of Computer Science
Reinforcement learningRestless banditsNetworkingFinancial Agent
Z
Zhiyang Deng
Stevens Institute of Technology
Y
Yiji Zhao
The FinAI
Mingquan Lin
Mingquan Lin
Assistant Professor at University of Minnesota
Medical image analysisDeep learning
M
Mei Qiu
Augusta University
K
Kaleb E Smith
NVIDIA
Arman Cohan
Arman Cohan
Yale University; Allen Institute for AI
Natural Language ProcessingMachine LearningArtificial Intelligence
Xiao-Yang Liu
Xiao-Yang Liu
Columbia University
TensorDeep LearningReinforcement LearningBig Data
Jimin Huang
Jimin Huang
The Fin AI
computational finance
Alejandro Lopez-Lira
Alejandro Lopez-Lira
Assistant Professor of Finance, University of Florida
FintechMachine LearningAsset PricingMacro FinancePrivate Equity
X
Xi Chen
New York University
J
Jun'ichi Tsujii
National Institute of Advanced Industrial Science and Technology
J
Jian-yun Nie
University of Montreal
Sophia Ananiadou
Sophia Ananiadou
Professor, Computer Science, Manchester University, National Centre for Text Mining
Natural Language ProcessingText MiningComputational LinguisticsArtificial Intelligence
Qianqian Xie
Qianqian Xie
Wuhan University
NLPLLM