🤖 AI Summary
Existing information retrieval evaluation benchmarks struggle to effectively assess systems’ unified retrieval capabilities across multi-source, heterogeneous data, particularly lacking support for multi-category fusion scenarios. To address this gap, this work proposes MIRA—the first benchmark specifically designed for multi-category integrated retrieval—built upon a large-scale social science search platform and encompassing four types of academic resources: publications, research datasets, variables, and tools. MIRA introduces a cross-category unified test set constructed from real user queries and leverages large language models to automatically generate topic descriptions and relevance judgments, significantly reducing annotation costs while maintaining high quality. The benchmark supports category-aware ranking and evaluation, offering a reproducible and extensible foundation for advancing cross-category information retrieval research.
📝 Abstract
Users increasingly expect modern search systems to offer a unified interface that seamlessly retrieves information from diverse data sources and formats. However, current information retrieval (IR) evaluation benchmarks have not kept pace with this development, primarily due to the lack of test collections that represent the diversity of contemporary search domains. We address this critical gap with MIRA, a novel benchmark based on a large-scale social science search platform. MIRA is designed for category-aware ranking across heterogeneous categories - Publications, Research Data, Variables, and Instruments & Tools - within a single, unified evaluation framework. The proposed collection is distinctive in several ways: (1) it is built upon real user queries, providing a more realistic basis for evaluation; (2) it covers scholarly items from four distinct categories, enabling multi-faceted evaluation; and (3) it leverages a Large Language Model to generate topic descriptions and narratives, as well as for relevance assessment with respect to these topics, substantially reducing the labor and cost of test collection generation. We release this resource to benefit the community by providing a foundational testbed for the research on multi-faceted, category-aware, integrated, or cross-category information retrieval.