MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing document parsing benchmarks are largely confined to single-page or single-task settings, making them inadequate for evaluating semantic continuity, hierarchical structure, and visual fidelity in multi-page documents. This work introduces a realistic, multi-page document parsing benchmark comprising 15 document categories in Chinese and English, with 433 meticulously annotated samples totaling 3,246 pages, enabling end-to-end document-level evaluation for the first time. The benchmark features a fine-grained evaluation protocol that encompasses text, table, and formula recognition; reading order inference; cross-page content consolidation; and heading hierarchy reconstruction. Experimental results reveal that while current models perform reasonably well on basic text extraction, they remain substantially deficient in semantic integration, visual parsing, and structural recovery, thereby establishing a unified and comprehensive foundation for advancing multi-page document understanding.

📝 Abstract

Document parsing converts visually rich documents into machine-readable structured representations, forming a crucial foundation for information systems. Although many benchmarks have been proposed for document parsing, they remain inadequate for realistic scenarios. Existing benchmarks either focus on specific tasks or assess only single-page, text-centric settings, making them insufficient for practical multi-page parsing. Moreover, they lack fine-grained evaluation of semantic continuity, hierarchical structure recovery, and visual content preservation. To address these gaps, we propose MPDocBench-Parse, a benchmark for multi-page document parsing in real-world applications. It contains 433 manually annotated documents with 3,246 pages, covering 15 document types in English and Chinese, with diverse layout styles, and supports document-level end-to-end evaluation. We further design a comprehensive protocol for content fidelity and logical structure, covering text, table, and formula recognition, truncated text and table merging, figure extraction, reading order, and heading hierarchy recovery. Experiments show that, while existing models perform well on basic text extraction, they still suffer clear limitations in semantic continuity integration, visual content parsing, and hierarchical structure recovery. MPDocBench-Parse provides a unified foundation for advancing document parsing toward more realistic scenarios.

Problem

Research questions and friction points this paper is trying to address.

multi-page document parsing

semantic continuity

hierarchical structure recovery

visual content preservation

document parsing benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-page document parsing

document structure recovery

semantic continuity