Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review

📅 2024-07-23
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the low efficiency and poor automation in key information extraction from commercial documents. We systematically review 96 deep learning methods published between 2017 and 2023, establishing the first comprehensive, rigorously categorized methodology framework for document understanding. A three-dimensional evaluation framework—assessing structural capability, generalization, and annotation efficiency—is proposed, revealing three core bottlenecks: weak cross-domain transferability, poor few-shot adaptability, and absence of end-to-end structured output. Integrating techniques from NLP, document image analysis, sequence modeling, and multimodality, we empirically evaluate mainstream models—including BERT, LayoutLM, Donut, and PPOCR—as well as the pretraining-fine-tuning paradigm. We open-source a reproducible classification matrix and benchmark suite. Our findings identify three priority research directions, providing both theoretical foundations and practical guidance for next-generation intelligent document processing systems.

Technology Category

Application Category

📝 Abstract
Extracting key information from documents represents a large portion of business workloads and therefore offers a high potential for efficiency improvements and process automation. With recent advances in deep learning, a plethora of deep learning-based approaches for Key Information Extraction have been proposed under the umbrella term Document Understanding that enable the processing of complex business documents. The goal of this systematic literature review is an in-depth analysis of existing approaches in this domain and the identification of opportunities for further research. To this end, 96 approaches published between 2017 and 2023 are analyzed in this study.
Problem

Research questions and friction points this paper is trying to address.

Extracting key information from business documents efficiently
Analyzing Deep Learning approaches for Document Understanding
Identifying research gaps in Key Information Extraction methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Learning for document information extraction
Systematic review of 130 Document Understanding approaches
Analysis of efficiency improvements in business processes
🔎 Similar Papers
No similar papers found.