DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

General-purpose text retrieval models exhibit unstable performance in disaster management due to their inability to adapt to diverse, domain-specific search intents. To address this, we propose DMRetriever—the first dedicated dense retrieval model family for disaster management—built upon a three-stage training framework: (1) unsupervised contrastive pretraining, (2) difficulty-aware progressive instruction tuning, and (3) bidirectional attention adaptation, augmented by a high-quality data refinement pipeline. DMRetriever significantly enhances generalization across six representative disaster-related search intents. On multiple benchmarks, it consistently outperforms state-of-the-art baselines: its 596M-parameter variant surpasses a model 13.3× larger, while its lightweight 33M-parameter variant achieves superior performance using only 7.6% of the parameters. This work is the first to systematically tackle the joint optimization of intent diversity and model efficiency in disaster management retrieval.

Technology Category

Application Category

📝 Abstract

Effective and efficient access to relevant information is essential for disaster management. However, no retrieval model is specialized for disaster management, and existing general-domain models fail to handle the varied search intents inherent to disaster management scenarios, resulting in inconsistent and unreliable performance. To this end, we introduce DMRetriever, the first series of dense retrieval models (33M to 7.6B) tailored for this domain. It is trained through a novel three-stage framework of bidirectional attention adaptation, unsupervised contrastive pre-training, and difficulty-aware progressive instruction fine-tuning, using high-quality data generated through an advanced data refinement pipeline. Comprehensive experiments demonstrate that DMRetriever achieves state-of-the-art (SOTA) performance across all six search intents at every model scale. Moreover, DMRetriever is highly parameter-efficient, with 596M model outperforming baselines over 13.3 X larger and 33M model exceeding baselines with only 7.6% of their parameters. All codes, data, and checkpoints are available at https://github.com/KaiYin97/DMRETRIEVER

Problem

Research questions and friction points this paper is trying to address.

Specialized text retrieval models for disaster management are lacking

General-domain models fail to handle varied disaster search intents

Existing approaches show inconsistent performance in disaster scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dense retrieval models specialized for disaster management

Three-stage training with bidirectional attention adaptation

Difficulty-aware progressive instruction fine-tuning for efficiency

🔎 Similar Papers

No similar papers found.