Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses layout-rich document information extraction (IE), systematically exploring the LLM-driven, layout-aware IE design space. It tackles three core challenges—data structuring, model interaction, and output refinement—by proposing layout-aware prompt engineering, document chunking, and multimodal input representation. The work formally defines the complete design space for layout-aware IE for the first time and demonstrates that general-purpose LLMs, when appropriately configured, achieve performance on par with specialized models. An efficient One-Factor-at-a-Time (OFAT) tuning strategy is introduced, yielding a +14.1 F1-point gain on the LayoutLMv3 benchmark—nearly matching the +15.1 gain of exhaustive full-factorial tuning—while substantially reducing token consumption. Additionally, the authors introduce and open-source LayIE-LLM, the first comprehensive evaluation suite specifically designed for layout-aware IE with LLMs.

Technology Category

Application Category

📝 Abstract
This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model engagement, and 3) output refinement. Our study delves into the sub-problems within these core challenges, such as input representation, chunking, prompting, and selection of LLMs and multimodal models. It examines the outcomes of different design choices through a new layout-aware IE test suite, benchmarking against the state-of-art (SoA) model LayoutLMv3. The results show that the configuration from one-factor-at-a-time (OFAT) trial achieves near-optimal results with 14.1 points F1-score gain from the baseline model, while full factorial exploration yields only a slightly higher 15.1 points gain at around 36x greater token usage. We demonstrate that well-configured general-purpose LLMs can match the performance of specialized models, providing a cost-effective alternative. Our test-suite is freely available at https://github.com/gayecolakoglu/LayIE-LLM.
Problem

Research questions and friction points this paper is trying to address.

Define design space for IE using LLMs
Address challenges in layout-aware IE
Optimize LLM configurations for document extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for layout-rich documents
One-factor-at-a-time trial
Cost-effective general-purpose LLMs
🔎 Similar Papers
No similar papers found.
G
Gaye Colakoglu
Zurich University of Applied Sciences, Switzerland; NEC Laboratories Europe, Heidelberg, Germany
Gürkan Solmaz
Gürkan Solmaz
Senior Researcher, NEC Laboratories Europe
machine learningcloud-edge computingweak supervisionknowledge extractiondigital twins
J
Jonathan Furst
Zurich University of Applied Sciences, Switzerland