Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation

πŸ“… 2025-05-22
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing financial benchmarks primarily assess question-answering capabilities, failing to reflect large language models’ (LLMs) reliability in realistic tasks such as generating fundamental analysis reports. To address this gap, we propose FinAR-Benchβ€”the first fine-grained benchmark for financial statement analysis. Grounded in accounting standards, it decomposes report generation into three sequential, semantically distinct stages: information extraction, financial metric computation, and cross-period logical reasoning. High-quality ground-truth annotations are constructed via expert labeling, rule-based validation, and consistency checking. We introduce a novel structured three-step evaluation paradigm enabling quantifiable, reproducible assessment of LLM performance at each stage. Experiments reveal critical bottlenecks across state-of-the-art models: metric computation accuracy plateaus at β‰ˆ68%, while logical reasoning lags significantly (F1 < 45%). This work establishes a new standard and scalable infrastructure for evaluating financial AI capabilities.

Technology Category

Application Category

πŸ“ Abstract
Generative AI, particularly large language models (LLMs), is beginning to transform the financial industry by automating tasks and helping to make sense of complex financial information. One especially promising use case is the automatic creation of fundamental analysis reports, which are essential for making informed investment decisions, evaluating credit risks, guiding corporate mergers, etc. While LLMs attempt to generate these reports from a single prompt, the risks of inaccuracy are significant. Poor analysis can lead to misguided investments, regulatory issues, and loss of trust. Existing financial benchmarks mainly evaluate how well LLMs answer financial questions but do not reflect performance in real-world tasks like generating financial analysis reports. In this paper, we propose FinAR-Bench, a solid benchmark dataset focusing on financial statement analysis, a core competence of fundamental analysis. To make the evaluation more precise and reliable, we break this task into three measurable steps: extracting key information, calculating financial indicators, and applying logical reasoning. This structured approach allows us to objectively assess how well LLMs perform each step of the process. Our findings offer a clear understanding of LLMs current strengths and limitations in fundamental analysis and provide a more practical way to benchmark their performance in real-world financial settings.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' accuracy in financial report generation
Assessing risks of AI errors in fundamental analysis
Creating benchmark for financial statement analysis tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

FinAR-Bench benchmark dataset for financial analysis
Three-step evaluation: extraction, calculation, and reasoning
Structured approach to assess LLM financial competence
πŸ”Ž Similar Papers
No similar papers found.
Zonghan Wu
Zonghan Wu
SAIFS, East China Normal University
graph neural networks
Junlin Wang
Junlin Wang
Duke University
Computer ScienceNLP
C
Congyuan Zou
Shanghai AI Finance School, East China Normal University
C
Chenhan Wang
OpenBayes.com
Y
Yilei Shao
Shanghai AI Finance School, East China Normal University