On the Comprehensibility of Multi-structured Financial Documents using LLMs and Pre-processing Tools

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Existing large language models (LLMs) and multimodal LLMs (MLLMs) exhibit limited comprehension of complex structural elements—such as nested tables and multidimensional charts—in heterogeneous financial PDF documents, leading to hallucinations and factual errors. Method: We conduct the first systematic evaluation of their structural understanding bottlenecks on multi-structured financial documents and propose a lightweight, industrial-grade preprocessing paradigm. This paradigm integrates PyMuPDF for layout-aware text extraction, Tabula for table detection and reconstruction, and text-layout alignment techniques to build an end-to-end structured parsing pipeline—without modifying model architectures. Contribution/Results: Our approach significantly improves accuracy: GPT-4o rises from 56.0% to 61.3%, and GPT-4 achieves 76.0%. Inference cost is concurrently reduced. The open-sourced implementation provides a reproducible, low-cost, and robust solution for structured financial document understanding.

Technology Category

Application Category

📝 Abstract

The proliferation of complex structured data in hybrid sources, such as PDF documents and web pages, presents unique challenges for current Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) in providing accurate answers. Despite the recent advancements of MLLMs, they still often falter when interpreting intricately structured information, such as nested tables and multi-dimensional plots, leading to hallucinations and erroneous outputs. This paper explores the capabilities of LLMs and MLLMs in understanding and answering questions from complex data structures found in PDF documents by leveraging industrial and open-source tools as part of a pre-processing pipeline. Our findings indicate that GPT-4o, a popular MLLM, achieves an accuracy of 56% on multi-structured documents when fed documents directly, and that integrating pre-processing tools raises the accuracy of LLMs to 61.3% for GPT-4o and 76% for GPT-4, and with lower overall cost. The code is publicly available at https://github.com/OGCDS/FinancialQA.

Problem

Research questions and friction points this paper is trying to address.

Challenges in interpreting multi-structured financial documents using LLMs

Improving accuracy of LLMs with pre-processing tools for complex data

Evaluating performance of GPT-4 and GPT-4o on financial QA tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs and MLLMs for financial documents

Integrates pre-processing tools for accuracy

Reduces cost while improving performance

🔎 Similar Papers

No similar papers found.