Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge

📅 2025-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the breadth and accuracy of large language models’ (LLMs) knowledge regarding historical financial data of U.S. publicly traded companies. Method: Leveraging a benchmark of 197,000 fact-based questions—grounded in authoritative financial statements—and integrating truth verification, multi-dimensional regression analysis, prompt engineering, empirical response comparison, and consistency validation, the study quantifies LLM performance across temporal, structural, and linguistic dimensions. Contribution/Results: We identify three pervasive limitations: (1) temporal decay—reduced accuracy for earlier fiscal periods; (2) scale bias—stronger recall for larger firms yet higher hallucination rates among them; and (3) selective coverage bias correlated with institutional attention and financial statement readability. Crucially, we provide the first quantitative evidence that firm size, analyst coverage, and disclosure clarity significantly modulate LLM knowledge accuracy. The study releases an open-source financial LLM evaluation benchmark, establishing a methodological foundation and empirical basis for assessing LLM reliability in finance.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are frequently utilized as sources of knowledge for question-answering. While it is known that LLMs may lack access to real-time data or newer data produced after the model's cutoff date, it is less clear how their knowledge spans across historical information. In this study, we assess the breadth of LLMs' knowledge using financial data of U.S. publicly traded companies by evaluating more than 197k questions and comparing model responses to factual data. We further explore the impact of company characteristics, such as size, retail investment, institutional attention, and readability of financial filings, on the accuracy of knowledge represented in LLMs. Our results reveal that LLMs are less informed about past financial performance, but they display a stronger awareness of larger companies and more recent information. Interestingly, at the same time, our analysis also reveals that LLMs are more likely to hallucinate for larger companies, especially for data from more recent years. We will make the code, prompts, and model outputs public upon the publication of the work.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' historical financial knowledge gaps
Evaluating company traits' impact on LLM accuracy
Identifying LLM hallucination trends for large firms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLMs on historical financial data
Assessing company characteristics impact on accuracy
Analyzing hallucination trends in larger companies
🔎 Similar Papers
No similar papers found.
Agam Shah
Agam Shah
PhD Candidate, Georgia Institute of Technology
Natural Language ProcessingFinanceData ScienceComputational Science
L
Liqin Ye
College of Computing & Scheller College of Business, Georgia Institute of Technology, Atlanta, GA
S
Sebastian Jaskowski
College of Computing & Scheller College of Business, Georgia Institute of Technology, Atlanta, GA
W
Wei Xu
College of Computing & Scheller College of Business, Georgia Institute of Technology, Atlanta, GA
Sudheer Chava
Sudheer Chava
Alton M. Costley Chair and Professor of Finance, Georgia Tech
Credit RiskBankingFinTechHousehold FinanceDerivatives