RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?

📅 2026-02-06
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the overconfidence of large language models (LLMs) in financial reasoning, where they often produce incorrect answers due to failure in recognizing missing critical premises. To systematically evaluate this limitation, the authors introduce REALFIN, a bilingual benchmark that generates fluent yet premise-deficient questions by deliberately removing key information from original financial exam items. A multi-task evaluation framework is proposed to assess models’ capabilities across three dimensions: answering questions, identifying missing information, and proactively abstaining from responding when necessary. This study presents the first systematic assessment of LLMs’ awareness of implicit assumption gaps in financial contexts, revealing significant reliability shortcomings in both general-purpose and finance-specific models. The findings underscore that trustworthy financial reasoning requires models to possess the metacognitive judgment to distinguish known from unknown—embodying the principle of “knowing what one knows.”

Technology Category

Application Category

📝 Abstract
Reliable financial reasoning requires knowing not only how to answer, but also when an answer cannot be justified. In real financial practice, problems often rely on implicit assumptions that are taken for granted rather than stated explicitly, causing problems to appear solvable while lacking enough information for a definite answer. We introduce REALFIN, a bilingual benchmark that evaluates financial reasoning by systematically removing essential premises from exam-style questions while keeping them linguistically plausible. Based on this, we evaluate models under three formulations that test answering, recognizing missing information, and rejecting unjustified options, and find consistent performance drops when key conditions are absent. General-purpose models tend to over-commit and guess, while most finance-specialized models fail to clearly identify missing premises. These results highlight a critical gap in current evaluations and show that reliable financial models must know when a question should not be answered.
Problem

Research questions and friction points this paper is trying to address.

financial reasoning
implicit assumptions
missing information
unjustified answers
reliable reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

financial reasoning
missing premise detection
LLM reliability
REALFIN benchmark
implicit assumptions
🔎 Similar Papers
No similar papers found.