DisastIR: A Comprehensive Information Retrieval Benchmark for Disaster Management

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing information retrieval (IR) benchmarks lack domain-specific evaluation frameworks for disaster management, failing to capture its linguistic complexity and heterogeneous, multi-source information needs. Method: We introduce DisastIR—the first IR benchmark tailored for disaster management—comprising 48 task categories, 9,600 real-world queries, and 1.3 million annotated query-paragraph pairs, covering six search intents and eight disaster types. It systematically models fine-grained, multi-intent event retrieval requirements in disaster scenarios and establishes a three-dimensional task taxonomy (intent × disaster × event type), alongside a cross-model evaluation framework built on large-scale, real-world disaster data. Contribution/Results: Experiments across 30 state-of-the-art IR models reveal significant performance degradation on disaster-specific tasks compared to general-domain benchmarks, underscoring the critical need for domain adaptation. DisastIR thus provides a foundational, rigorous evaluation infrastructure for advancing disaster-aware IR research.

Technology Category

Application Category

📝 Abstract

Effective disaster management requires timely access to accurate and contextually relevant information. Existing Information Retrieval (IR) benchmarks, however, focus primarily on general or specialized domains, such as medicine or finance, neglecting the unique linguistic complexity and diverse information needs encountered in disaster management scenarios. To bridge this gap, we introduce DisastIR, the first comprehensive IR evaluation benchmark specifically tailored for disaster management. DisastIR comprises 9,600 diverse user queries and more than 1.3 million labeled query-passage pairs, covering 48 distinct retrieval tasks derived from six search intents and eight general disaster categories that include 301 specific event types. Our evaluations of 30 state-of-the-art retrieval models demonstrate significant performance variances across tasks, with no single model excelling universally. Furthermore, comparative analyses reveal significant performance gaps between general-domain and disaster management-specific tasks, highlighting the necessity of disaster management-specific benchmarks for guiding IR model selection to support effective decision-making in disaster management scenarios. All source codes and DisastIR are available at https://github.com/KaiYin97/Disaster_IR.

Problem

Research questions and friction points this paper is trying to address.

Lack of specialized IR benchmarks for disaster management needs

Existing IR models show inconsistent performance in disaster scenarios

Need for domain-specific evaluation to improve disaster decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

First comprehensive IR benchmark for disaster management

Includes 9,600 queries and 1.3 million labeled pairs

Evaluates 30 models across 48 distinct retrieval tasks

🔎 Similar Papers

No similar papers found.