🤖 AI Summary
To address the poor generalizability and limited scalability of OCR-based and rule-driven approaches in unstructured document information extraction (IE), this paper proposes an end-to-end, fusion-enhanced AI (A²I) framework. The framework synergistically integrates computer vision and natural language processing techniques to enable semantic-aware, adaptive region localization and field understanding. It jointly models OCR, object detection, key-region segmentation, and context-enhanced sequence labeling. Evaluated on a multi-category document dataset, the framework achieves an F1 score of 92.4%, outperforming baseline methods by 11.6%. Crucially, it supports zero-shot field transfer and dynamic template adaptation—overcoming critical performance bottlenecks of conventional methods in label-scarce and large-scale deployment scenarios.
📝 Abstract
Process of information extraction (IE) is often used to extract meaningful information from unstructured and unlabeled data. Conventional methods of data extraction including application of OCR and passing extraction engine, are inefficient on large data and have their limitation. In this paper, a peculiar technique of information extraction is proposed using A2I and computer vision technologies, which also includes NLP.