🤖 AI Summary
Existing KIE evaluation metrics (e.g., Span-F1) focus solely on token-level entity matching and fail to reflect the real-world industrial requirement of extracting coherent, structured information groups. Method: We propose KIEval—the first application-oriented KIE evaluation framework—that explicitly incorporates structured grouping capability into the assessment pipeline, jointly modeling entity recognition and hierarchical grouping consistency. Its core innovation lies in unifying semantics-aware entity-level F1 with a novel group-level consistency score, enabling interpretable error diagnosis. Results: Extensive experiments across multiple industrial document datasets demonstrate that KIEval significantly outperforms conventional metrics, yielding more accurate discrimination of model utility in production settings. By shifting evaluation emphasis from isolated span matching to end-to-end structural understanding, KIEval advances KIE assessment toward practical deployment readiness.
📝 Abstract
Document Key Information Extraction (KIE) is a technology that transforms valuable information in document images into structured data, and it has become an essential function in industrial settings. However, current evaluation metrics of this technology do not accurately reflect the critical attributes of its industrial applications. In this paper, we present KIEval, a novel application-centric evaluation metric for Document KIE models. Unlike prior metrics, KIEval assesses Document KIE models not just on the extraction of individual information (entity) but also of the structured information (grouping). Evaluation of structured information provides assessment of Document KIE models that are more reflective of extracting grouped information from documents in industrial settings. Designed with industrial application in mind, we believe that KIEval can become a standard evaluation metric for developing or applying Document KIE models in practice. The code will be publicly available.