UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Key information extraction (KIE) from real-world documents remains challenging due to diverse layouts, varying visual quality, and heterogeneous task requirements. This work proposes UNIKIE-Bench, a unified benchmark that introduces a dual-track evaluation framework—simultaneously assessing practical applicability through constrained categories and open-ended capabilities via open categories. The benchmark systematically evaluates the end-to-end KIE performance of 15 state-of-the-art large models across diverse document types. Experimental results reveal significant performance degradation of current models on complex layouts and long-tail fields, highlighting critical bottlenecks in layout awareness and semantic understanding. UNIKIE-Bench thus establishes a reliable foundation and provides clear guidance for future research in document intelligence.

Technology Category

Application Category

📝 Abstract
Key Information Extraction (KIE) from real-world documents remains challenging due to substantial variations in layout structures, visual quality, and task-specific information requirements. Recent Large Multimodal Models (LMMs) have shown promising potential for performing end-to-end KIE directly from document images. To enable a comprehensive and systematic evaluation across realistic and diverse application scenarios, we introduce UNIKIE-BENCH, a unified benchmark designed to rigorously evaluate the KIE capabilities of LMMs. UNIKIE-BENCH consists of two complementary tracks: a constrained-category KIE track with scenario-predefined schemas that reflect practical application needs, and an open-category KIE track that extracts any key information that is explicitly present in the document. Experiments on 15 state-of-the-art LMMs reveal substantial performance degradation under diverse schema definitions, long-tail key fields, and complex layouts, along with pronounced performance disparities across different document types and scenarios. These findings underscore persistent challenges in grounding accuracy and layout-aware reasoning for LMM-based KIE. All codes and datasets are available at https://github.com/NEUIR/UNIKIE-BENCH.
Problem

Research questions and friction points this paper is trying to address.

Key Information Extraction
Large Multimodal Models
Visual Documents
Layout Understanding
Benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Key Information Extraction
Large Multimodal Models
Document Understanding
Benchmark
Layout-aware Reasoning
🔎 Similar Papers
No similar papers found.