UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Key information extraction (KIE) from real-world documents remains challenging due to diverse layouts, varying visual quality, and heterogeneous task requirements. This work proposes UNIKIE-Bench, a unified benchmark that introduces a dual-track evaluation framework—simultaneously assessing practical applicability through constrained categories and open-ended capabilities via open categories. The benchmark systematically evaluates the end-to-end KIE performance of 15 state-of-the-art large models across diverse document types. Experimental results reveal significant performance degradation of current models on complex layouts and long-tail fields, highlighting critical bottlenecks in layout awareness and semantic understanding. UNIKIE-Bench thus establishes a reliable foundation and provides clear guidance for future research in document intelligence.

Technology Category

Application Category

📝 Abstract

Key Information Extraction (KIE) from real-world documents remains challenging due to substantial variations in layout structures, visual quality, and task-specific information requirements. Recent Large Multimodal Models (LMMs) have shown promising potential for performing end-to-end KIE directly from document images. To enable a comprehensive and systematic evaluation across realistic and diverse application scenarios, we introduce UNIKIE-BENCH, a unified benchmark designed to rigorously evaluate the KIE capabilities of LMMs. UNIKIE-BENCH consists of two complementary tracks: a constrained-category KIE track with scenario-predefined schemas that reflect practical application needs, and an open-category KIE track that extracts any key information that is explicitly present in the document. Experiments on 15 state-of-the-art LMMs reveal substantial performance degradation under diverse schema definitions, long-tail key fields, and complex layouts, along with pronounced performance disparities across different document types and scenarios. These findings underscore persistent challenges in grounding accuracy and layout-aware reasoning for LMM-based KIE. All codes and datasets are available at https://github.com/NEUIR/UNIKIE-BENCH.

Problem

Research questions and friction points this paper is trying to address.

Key Information Extraction

Large Multimodal Models

Visual Documents

Layout Understanding

Benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Key Information Extraction

Large Multimodal Models

Document Understanding