Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge

📅 2025-03-30

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This study systematically evaluates the breadth and accuracy of large language models’ (LLMs) knowledge regarding historical financial data of U.S. publicly traded companies. Method: Leveraging a benchmark of 197,000 fact-based questions—grounded in authoritative financial statements—and integrating truth verification, multi-dimensional regression analysis, prompt engineering, empirical response comparison, and consistency validation, the study quantifies LLM performance across temporal, structural, and linguistic dimensions. Contribution/Results: We identify three pervasive limitations: (1) temporal decay—reduced accuracy for earlier fiscal periods; (2) scale bias—stronger recall for larger firms yet higher hallucination rates among them; and (3) selective coverage bias correlated with institutional attention and financial statement readability. Crucially, we provide the first quantitative evidence that firm size, analyst coverage, and disclosure clarity significantly modulate LLM knowledge accuracy. The study releases an open-source financial LLM evaluation benchmark, establishing a methodological foundation and empirical basis for assessing LLM reliability in finance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are frequently utilized as sources of knowledge for question-answering. While it is known that LLMs may lack access to real-time data or newer data produced after the model's cutoff date, it is less clear how their knowledge spans across historical information. In this study, we assess the breadth of LLMs' knowledge using financial data of U.S. publicly traded companies by evaluating more than 197k questions and comparing model responses to factual data. We further explore the impact of company characteristics, such as size, retail investment, institutional attention, and readability of financial filings, on the accuracy of knowledge represented in LLMs. Our results reveal that LLMs are less informed about past financial performance, but they display a stronger awareness of larger companies and more recent information. Interestingly, at the same time, our analysis also reveals that LLMs are more likely to hallucinate for larger companies, especially for data from more recent years. We will make the code, prompts, and model outputs public upon the publication of the work.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' historical financial knowledge gaps

Evaluating company traits' impact on LLM accuracy

Identifying LLM hallucination trends for large firms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLMs on historical financial data

Assessing company characteristics impact on accuracy

Analyzing hallucination trends in larger companies

🔎 Similar Papers

No similar papers found.