FinGAIA: An End-to-End Benchmark for Evaluating AI Agents in Finance

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI agents for finance lack systematic evaluation of multi-step, multi-tool collaboration capabilities. Method: We introduce FinGAIA, the first end-to-end benchmark for financial AI agents, covering 407 real-world tasks across seven subdomains (e.g., securities, funds, banking) and proposing a three-level scenario-based evaluation framework. It employs a zero-shot automated assessment pipeline integrating ten state-of-the-art large language model–driven agents under unified multi-task, multi-tool interaction settings. Contribution/Results: FinGAIA identifies five prevalent failure modes—including cross-modal misalignment and financial terminology bias—for the first time. Experimental results show that the best-performing agent (ChatGPT) achieves only 48.9% accuracy—over 35 percentage points below human financial experts—highlighting critical bottlenecks in complex financial decision-making and domain-specific collaboration. FinGAIA provides a reproducible benchmark and concrete directions for advancing financial AI agents.

Technology Category

Application Category

📝 Abstract
The booming development of AI agents presents unprecedented opportunities for automating complex tasks across various domains. However, their multi-step, multi-tool collaboration capabilities in the financial sector remain underexplored. This paper introduces FinGAIA, an end-to-end benchmark designed to evaluate the practical abilities of AI agents in the financial domain. FinGAIA comprises 407 meticulously crafted tasks, spanning seven major financial sub-domains: securities, funds, banking, insurance, futures, trusts, and asset management. These tasks are organized into three hierarchical levels of scenario depth: basic business analysis, asset decision support, and strategic risk management. We evaluated 10 mainstream AI agents in a zero-shot setting. The best-performing agent, ChatGPT, achieved an overall accuracy of 48.9%, which, while superior to non-professionals, still lags financial experts by over 35 percentage points. Error analysis has revealed five recurring failure patterns: Cross-modal Alignment Deficiency, Financial Terminological Bias, Operational Process Awareness Barrier, among others. These patterns point to crucial directions for future research. Our work provides the first agent benchmark closely related to the financial domain, aiming to objectively assess and promote the development of agents in this crucial field. Partial data is available at https://github.com/SUFE-AIFLM-Lab/FinGAIA.
Problem

Research questions and friction points this paper is trying to address.

Evaluating AI agents' financial task performance
Assessing multi-step, multi-tool collaboration in finance
Identifying failure patterns in financial AI agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end benchmark for financial AI agents
407 tasks across seven financial sub-domains
Hierarchical levels for scenario depth evaluation
🔎 Similar Papers
No similar papers found.
Lingfeng Zeng
Lingfeng Zeng
上海财经大学
大语言模型
F
Fangqi Lou
Shanghai University of Finance and Economics
Z
Zixuan Wang
Shanghai University of Finance and Economics
J
Jiajie Xu
Shanghai University of Finance and Economics
J
Jinyi Niu
Fudan University
M
Mengping Li
Shanghai University of Finance and Economics
Yifan Dong
Yifan Dong
Boise State University
Q
Qi Qi
Shanghai University of Finance and Economics
W
Wei Zhang
Shanghai University of Finance and Economics
Ziwei Yang
Ziwei Yang
Bioinformatics Center, Institute for Chemical Research, Kyoto University
BioinformaticsMachine LearningComputational BiologyBiomedical Data Science
J
Jun Han
Shanghai University of Finance and Economics
R
Ruilun Feng
Shanghai University of Finance and Economics
R
Ruiqi Hu
Shanghai University of Finance and Economics
L
Lejie Zhang
Shanghai University of Finance and Economics
Z
Zhengbo Feng
Shanghai University of Finance and Economics
Y
Yicheng Ren
Shanghai University of Finance and Economics
X
Xin Guo
Shanghai University of Finance and Economics
Z
Zhaowei Liu
Shanghai University of Finance and Economics
D
Dongpo Cheng
Shanghai University of Finance and Economics
W
Weige Cai
Shanghai University of Finance and Economics
L
Liwen Zhang
Shanghai University of Finance and Economics