MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing information retrieval evaluation benchmarks struggle to effectively assess systems’ unified retrieval capabilities across multi-source, heterogeneous data, particularly lacking support for multi-category fusion scenarios. To address this gap, this work proposes MIRA—the first benchmark specifically designed for multi-category integrated retrieval—built upon a large-scale social science search platform and encompassing four types of academic resources: publications, research datasets, variables, and tools. MIRA introduces a cross-category unified test set constructed from real user queries and leverages large language models to automatically generate topic descriptions and relevance judgments, significantly reducing annotation costs while maintaining high quality. The benchmark supports category-aware ranking and evaluation, offering a reproducible and extensible foundation for advancing cross-category information retrieval research.

📝 Abstract

Users increasingly expect modern search systems to offer a unified interface that seamlessly retrieves information from diverse data sources and formats. However, current information retrieval (IR) evaluation benchmarks have not kept pace with this development, primarily due to the lack of test collections that represent the diversity of contemporary search domains. We address this critical gap with MIRA, a novel benchmark based on a large-scale social science search platform. MIRA is designed for category-aware ranking across heterogeneous categories - Publications, Research Data, Variables, and Instruments & Tools - within a single, unified evaluation framework. The proposed collection is distinctive in several ways: (1) it is built upon real user queries, providing a more realistic basis for evaluation; (2) it covers scholarly items from four distinct categories, enabling multi-faceted evaluation; and (3) it leverages a Large Language Model to generate topic descriptions and narratives, as well as for relevance assessment with respect to these topics, substantially reducing the labor and cost of test collection generation. We release this resource to benefit the community by providing a foundational testbed for the research on multi-faceted, category-aware, integrated, or cross-category information retrieval.

Problem

Research questions and friction points this paper is trying to address.

information retrieval

evaluation benchmark

multi-category retrieval

integrated search

heterogeneous data

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-assisted benchmark

multi-category retrieval

category-aware ranking

test collection generation

integrated information retrieval

🔎 Similar Papers

Surveying the MLLM Landscape: A Meta-Review of Current Surveys

2024-09-17arXiv.orgCitations: 8