Improved LLM Agents for Financial Document Question Answering

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Large language models (LLMs) exhibit poor performance on numerical question answering over financial documents containing both tables and text, while conventional critique agents rely on oracle labels and lack robustness. Method: This paper proposes a self-correcting multi-agent system comprising: (1) a robust, oracle-free critic mechanism that autonomously identifies numerical reasoning errors; (2) a novel collaborative calculator agent—decoupled from the LLM—to perform precise arithmetic computations; and (3) integrated techniques including programmatic chain-of-thought enhancement, dynamic self-correction, structured numerical parsing, and interactive reasoning-chain optimization. Results: On a financial document QA benchmark, our approach reduces error rate by 37% compared to Program-of-Thought (PoT), achieves 89.2% accuracy on numerical answers, and significantly improves reasoning safety and hallucination resistance.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have shown impressive capabilities on numerous natural language processing tasks. However, LLMs still struggle with numerical question answering for financial documents that include tabular and textual data. Recent works have showed the effectiveness of critic agents (i.e., self-correction) for this task given oracle labels. Building upon this framework, this paper examines the effectiveness of the traditional critic agent when oracle labels are not available, and show, through experiments, that this critic agent's performance deteriorates in this scenario. With this in mind, we present an improved critic agent, along with the calculator agent which outperforms the previous state-of-the-art approach (program-of-thought) and is safer. Furthermore, we investigate how our agents interact with each other, and how this interaction affects their performance.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with financial document numerical QA

Critic agent performance drops without oracle labels

Proposing improved critic and calculator agent interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved critic agent without oracle labels

Calculator agent enhances accuracy and safety

Interaction analysis between agents boosts performance

🔎 Similar Papers

Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering