FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the lack of standardized benchmarks for evaluating tool-calling capabilities of large language models (LLMs) in real-world financial scenarios. To this end, we propose FinMCP-Bench, the first benchmark that systematically integrates 65 authentic financial MCP tools spanning 10 major scenarios and 33 sub-scenarios, and constructs a high-fidelity evaluation set comprising 613 samples. The benchmark introduces multi-granular task complexity levels and composite evaluation metrics to holistically assess model performance in terms of both invocation accuracy and reasoning ability across single-tool, multi-tool, and multi-turn interactive settings. Comprehensive evaluations of mainstream LLMs demonstrate that FinMCP-Bench effectively uncovers the strengths and limitations of current models in financial agent applications, thereby offering a reliable foundation for future research.

Technology Category

Application Category

📝 Abstract

This paper introduces \textbf{FinMCP-Bench}, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic user queries to ensure diversity and authenticity. It incorporates 65 real financial MCPs and three types of samples, single tool, multi-tool, and multi-turn, allowing evaluation of models across different levels of task complexity. Using this benchmark, we systematically assess a range of mainstream LLMs and propose metrics that explicitly measure tool invocation accuracy and reasoning capabilities. FinMCP-Bench provides a standardized, practical, and challenging testbed for advancing research on financial LLM agents.

Problem

Research questions and friction points this paper is trying to address.

LLM agents

financial tool use

Model Context Protocol

benchmarking

tool invocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

FinMCP-Bench

Model Context Protocol

Financial LLM Agents