🤖 AI Summary
To address the low accuracy of large language models (LLMs) on financial numerical reasoning tasks—such as FinQA and ConvFinQA—this paper proposes FINDER, a novel framework that synergistically integrates generative retrieval with context-aware Program-of-Thought (PoT) prompting. FINDER dynamically retrieves domain-critical facts from unstructured text and tabular data while adaptively selecting in-context few-shot examples to enhance both multimodal understanding and precise numerical computation. Its core innovations include a generative-retrieval-driven fact extraction mechanism and a context-aware PoT chain construction strategy that explicitly grounds reasoning steps in retrieved evidence. Extensive experiments demonstrate that FINDER achieves state-of-the-art performance, improving execution accuracy by 5.98% on FinQA and 4.05% on ConvFinQA over prior methods.
📝 Abstract
Despite continuous advancements in the capabilities of large language models (LLMs), numerical reasoning remains a challenging area. Techniques like chain-of-thought prompting, tree-of-thought prompting, and program-of-thought prompting guide LLMs through intermediate reasoning steps. Although in-context learning with few-shot prompting has improved performance, LLMs still lag behind state-of-the-art models on financial numerical reasoning datasets such as FinQA and ConvFinQA. In this work, we introduce FINDER, a novel two-step framework, to enhance LLMs' capabilities in financial numerical reasoning. The first step utilizes a generative retriever to extract relevant facts from unstructured data, including both text and tables. This is followed by context-aware Program of Thought prompting with dynamic selection of in-context examples. Our model FINDER achieves a new state-of-the-art performance on both the FinQA and ConvFinQA datasets, surpassing previous benchmarks with execution accuracy improvements of 5.98% and 4.05%, respectively.