🤖 AI Summary
A unified, open evaluation framework is currently lacking to rigorously assess financial large language models (FinLLMs) across domain-specific tasks—including financial statement analysis, risk assessment, and regulatory compliance—and to evaluate their industrial readiness.
Method: We introduce the first open-source, community-driven FinLLM benchmark and dynamic leaderboard, establishing a novel “Financial AI Readiness” evaluation ecosystem. Leveraging resources from the Linux Foundation and Hugging Face, our approach integrates multi-task benchmarking, multimodal financial datasets, and a scalable agent-based evaluation protocol to enable co-evolution of models, datasets, and evaluation tasks.
Contribution/Results: We publicly release the first reproducible FinLLM leaderboard, significantly enhancing model transparency and cross-institutional comparability. This initiative advances standardization and equitable deployment of production-grade financial AI, fostering rigorous, collaborative, and industry-aligned evaluation practices.
📝 Abstract
Financial large language models (FinLLMs) with multimodal capabilities are envisioned to revolutionize applications across business, finance, accounting, and auditing. However, real-world adoption requires robust benchmarks of FinLLMs' and agents' performance. Maintaining an open leaderboard of models is crucial for encouraging innovative adoption and improving model effectiveness. In collaboration with Linux Foundation and Hugging Face, we create an open FinLLM leaderboard, which serves as an open platform for assessing and comparing LLMs' performance on a wide spectrum of financial tasks. By demoncratizing access to advanced AI tools and financial knowledge, a chatbot or agent may enhance the analytical capabilities of the general public to a professional-level within a few months of usage. This open leaderboard welcomes contributions from academia, open-source community, industry, and stakeholders. In particular, we encourage contributions of new datasets, tasks, and models for continual update. Through fostering a collaborative and open ecosystem, we seek to ensure the long-term sustainability and relevance of LLMs and agents as they evolve with the financial sector's needs.