FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of standardized benchmarks for evaluating tool-calling capabilities of large language models (LLMs) in real-world financial scenarios. To this end, we propose FinMCP-Bench, the first benchmark that systematically integrates 65 authentic financial MCP tools spanning 10 major scenarios and 33 sub-scenarios, and constructs a high-fidelity evaluation set comprising 613 samples. The benchmark introduces multi-granular task complexity levels and composite evaluation metrics to holistically assess model performance in terms of both invocation accuracy and reasoning ability across single-tool, multi-tool, and multi-turn interactive settings. Comprehensive evaluations of mainstream LLMs demonstrate that FinMCP-Bench effectively uncovers the strengths and limitations of current models in financial agent applications, thereby offering a reliable foundation for future research.

Technology Category

Application Category

📝 Abstract
This paper introduces \textbf{FinMCP-Bench}, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic user queries to ensure diversity and authenticity. It incorporates 65 real financial MCPs and three types of samples, single tool, multi-tool, and multi-turn, allowing evaluation of models across different levels of task complexity. Using this benchmark, we systematically assess a range of mainstream LLMs and propose metrics that explicitly measure tool invocation accuracy and reasoning capabilities. FinMCP-Bench provides a standardized, practical, and challenging testbed for advancing research on financial LLM agents.
Problem

Research questions and friction points this paper is trying to address.

LLM agents
financial tool use
Model Context Protocol
benchmarking
tool invocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

FinMCP-Bench
Model Context Protocol
Financial LLM Agents
Tool Invocation
Benchmarking
🔎 Similar Papers
No similar papers found.
Jie Zhu
Jie Zhu
Alibaba Group, Tongyi Dianjin Team
LLMsNatural Language Generation
Y
Yimin Tian
Qwen DianJin Team, Alibaba Cloud Computing
B
Boyang Li
Qwen DianJin Team, Alibaba Cloud Computing
K
Kehao Wu
YINGMI Wealth Management
Z
Zhongzhi Liang
YINGMI Wealth Management
J
Junhui Li
Soochow University
X
Xianyin Zhang
Qwen DianJin Team, Alibaba Cloud Computing
Lifan Guo
Lifan Guo
Researcher Drexel University
Machine Learning
F
Feng Chen
Qwen DianJin Team, Alibaba Cloud Computing
Y
Yong Liu
YINGMI Wealth Management
C
Chi Zhang
Qwen DianJin Team, Alibaba Cloud Computing