Open FinLLM Leaderboard: Towards Financial AI Readiness

📅 2025-01-19
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
A unified, open evaluation framework is currently lacking to rigorously assess financial large language models (FinLLMs) across domain-specific tasks—including financial statement analysis, risk assessment, and regulatory compliance—and to evaluate their industrial readiness. Method: We introduce the first open-source, community-driven FinLLM benchmark and dynamic leaderboard, establishing a novel “Financial AI Readiness” evaluation ecosystem. Leveraging resources from the Linux Foundation and Hugging Face, our approach integrates multi-task benchmarking, multimodal financial datasets, and a scalable agent-based evaluation protocol to enable co-evolution of models, datasets, and evaluation tasks. Contribution/Results: We publicly release the first reproducible FinLLM leaderboard, significantly enhancing model transparency and cross-institutional comparability. This initiative advances standardization and equitable deployment of production-grade financial AI, fostering rigorous, collaborative, and industry-aligned evaluation practices.

Technology Category

Application Category

📝 Abstract
Financial large language models (FinLLMs) with multimodal capabilities are envisioned to revolutionize applications across business, finance, accounting, and auditing. However, real-world adoption requires robust benchmarks of FinLLMs' and agents' performance. Maintaining an open leaderboard of models is crucial for encouraging innovative adoption and improving model effectiveness. In collaboration with Linux Foundation and Hugging Face, we create an open FinLLM leaderboard, which serves as an open platform for assessing and comparing LLMs' performance on a wide spectrum of financial tasks. By demoncratizing access to advanced AI tools and financial knowledge, a chatbot or agent may enhance the analytical capabilities of the general public to a professional-level within a few months of usage. This open leaderboard welcomes contributions from academia, open-source community, industry, and stakeholders. In particular, we encourage contributions of new datasets, tasks, and models for continual update. Through fostering a collaborative and open ecosystem, we seek to ensure the long-term sustainability and relevance of LLMs and agents as they evolve with the financial sector's needs.
Problem

Research questions and friction points this paper is trying to address.

Financial Language Models
Performance Evaluation
Adaptive Financial Demand
Innovation

Methods, ideas, or system contributions that make the work stand out.

FinLLMs
Open Benchmark
Financial AI
🔎 Similar Papers
No similar papers found.
S
Shengyuan Colin Lin
Rensselaer Polytechnic Institute, USA
F
Felix Tian
Rensselaer Polytechnic Institute, USA
K
Keyi Wang
Columbia University, USA
X
Xingjian Zhao
Rensselaer Polytechnic Institute, USA
Jimin Huang
Jimin Huang
The Fin AI
computational finance
Qianqian Xie
Qianqian Xie
Wuhan University
NLPLLM
L
Luca Borella
Linux Foundation, New York, USA
M
Matt White
GM of AI, Linux Foundation; Executive Director, PyTorch Foundation; UC Berkeley, Berkeley, USA
Christina Dan Wang
Christina Dan Wang
New York University Shanghai
Kairong Xiao
Kairong Xiao
Columbia Business School
Financial IntermediationIndustrial OrganizationMonetary EconomicsPolitical Economy
X
Xiao-Yang Liu Yanglet
Columbia University and Rensselaer Polytechnic Institute, New York, NY, USA
Li Deng
Li Deng
Chief AI Officer, Citadel (former)
Speech and Language ProcessingDeep learningArtificial IntelligenceSignal ProcessingFinancial Engineering