๐ค AI Summary
This work addresses the challenge large language models face in accurately extracting and reasoning about numerical information from unstructured text and semi-structured tables in financial documents. To enhance numerical understanding and computation capabilities, the authors propose a โstructure-first, reason-laterโ framework that automatically constructs a knowledge graph by leveraging intrinsic document patterns, thereby injecting structured semantics into the language model. Experimental results based on the Llama 3.1 8B Instruct model demonstrate that this approach achieves a relative improvement of approximately 12% in execution accuracy over the base model on the FinQA benchmark, significantly enhancing both the accuracy and logical coherence of financial numerical reasoning.
๐ Abstract
Numerical reasoning is an important task in the analysis of financial documents. It helps in understanding and performing numerical predictions with logical conclusions for the given query seeking answers from financial texts. Recently, Large Language Models (LLMs) have shown promising results in multiple Question-Answering (Q-A) systems with the capability of logical reasoning. As documents related to finance often consist of long and complex financial contexts, LLMs appear well-suited for building high-quality automated financial question-answering systems. However, LLMs often face challenges in accurately processing the various numbers within financial reports. Extracting numerical data from unstructured text and semi-structured tables, and reliably performing accurate calculations, remains a significant bottleneck for numerical reasoning in most state-of-the-art LLMs. Recent studies have shown that structured data augmentations, such as Knowledge Graphs (KGs), have notably improved the predictions of LLMs along with logical explanations. Thus, it is an important requirement to consider inherent structured information in financial reports while using LLMs for various financial analytics. This paper proposes a framework to incorporate structured information using KGs along with LLM predictions for numerical reasoning tasks. The KGs are extracted using a proposed schema inherently from the document under processing. We evaluated our proposed framework over the benchmark data FinQA, using an open-source LLM, namely Llama 3.1 8B Instruct. We observed that the proposed framework improved execution accuracy by approximately 12% relative to the vanilla LLM.