🤖 AI Summary
Existing document classification benchmarks are largely confined to single-domain settings and flat label structures, failing to capture the hierarchical, multimodal, and cross-domain characteristics of real-world business documents. This work proposes MMM-Bench—the first industrial-scale benchmark for multi-level, multi-domain, and multimodal document classification—comprising 5,990 authentic documents across 12 commercial domains, annotated with a five-level hierarchical label taxonomy and complete human-verified classification paths. We systematically identify four core challenges inherent to this task, establish comprehensive baselines leveraging both open-source models and commercial APIs, and validate their efficacy through expert evaluation and empirical experiments. The MMM-Bench dataset and accompanying evaluation toolkit are publicly released to advance research in document intelligence.
📝 Abstract
Document classification forms the backbone of modern enterprise content management, yet existing benchmarks remain trapped in oversimplified paradigms -- single domain settings with flat label structures -- that bear little resemblance to the hierarchical, multi-modal, and cross-domain nature of real-world business documents. This gap not only misrepresents practical complexity but also stifles progress toward industrially viable document intelligence. To bridge this gap, we construct the first Multi-level, Multi-domain, Multi-modal document classification Benchmark (MMM-Bench). MMM-Bench includes (1) a deeply hierarchical taxonomy spanning five levels that capture the authentic organizational logic of business documentation; and (2) 5,990 real-world multi-modal documents meticulously curated from 12 commercial domains in Alibaba. Each document is manually annotated with a complete hierarchical path by domain experts. We establish comprehensive baselines on MMM-Bench, which consists of open-weight models and API-based models. Through systematic experiments, we identify four fundamental challenges within MMM-Bench and propose corresponding insights. To provide a solid foundation for advancing research in multi-level, multi-domain document classification, we release all of the data and the evaluation toolkit of MMM-Bench at https://github.com/MMMDC-Bench/MMMDC-Bench.