LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional evaluation metrics for document layout analysis—such as IoU and mAP—measure only spatial overlap, failing to detect structural errors like region merging, splitting, or omission, thereby distorting assessments of model structural robustness. To address this, we propose LED, the first benchmark systematically defining eight categories of structural errors and establishing a three-task evaluation framework targeting error detection, error classification, and element-level localization. Leveraging synthetically generated data, we construct LED-Dataset—a dedicated resource enabling structural capability diagnosis for both large language models and multimodal models. Experiments demonstrate that LED effectively uncovers critical layout-perception deficiencies in existing methods, quantifies modality-specific biases and performance trade-offs, and establishes the first standardized, structure-robustness–focused evaluation benchmark for document understanding.

Technology Category

Application Category

📝 Abstract
Recent advancements in Document Layout Analysis through Large Language Models and Multimodal Models have significantly improved layout detection. However, despite these improvements, challenges remain in addressing critical structural errors, such as region merging, splitting, and missing content. Conventional evaluation metrics like IoU and mAP, which focus primarily on spatial overlap, are insufficient for detecting these errors. To address this limitation, we propose Layout Error Detection (LED), a novel benchmark designed to evaluate the structural robustness of document layout predictions. LED defines eight standardized error types, and formulates three complementary tasks: error existence detection, error type classification, and element-wise error type classification. Furthermore, we construct LED-Dataset, a synthetic dataset generated by injecting realistic structural errors based on empirical distributions from DLA models. Experimental results across a range of LMMs reveal that LED effectively differentiates structural understanding capabilities, exposing modality biases and performance trade-offs not visible through traditional metrics.
Problem

Research questions and friction points this paper is trying to address.

Detects structural layout errors in document analysis
Evaluates robustness using eight standardized error types
Addresses limitations of traditional spatial overlap metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Layout Error Detection benchmark
Defines eight standardized error types
Constructs synthetic LED-Dataset empirically
🔎 Similar Papers
No similar papers found.
I
Inbum Heo
Chungnam National University, Computer Science & Engineering
T
Taewook Hwang
Chungnam National University, Computer Science & Engineering
Jeesu Jung
Jeesu Jung
Chungnam National University
Natural Language Processing
Sangkeun Jung
Sangkeun Jung
Chungnam National University
artificial intelligencenatural language processingmachine learning