CrisiSense-RAG: Crisis Sensing Multimodal Retrieval-Augmented Generation for Rapid Disaster Impact Assessment

📅 2026-01-30

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the persistent underestimation of flood extents in automated disaster impact assessments caused by temporal misalignment between social media reports and satellite imagery. To overcome this, the authors propose a multimodal retrieval-augmented generation framework that asynchronously fuses real-time textual data with remote sensing images, enabling rapid evaluation without disaster-specific fine-tuning. The approach prioritizes socially sensed data to infer inundation extents, supplements it with image-based verification of structural damage, and supports zero-shot deployment. It integrates hybrid dense–sparse text retrieval, CLIP-based image retrieval, and a multi-pipeline architecture, with prompt alignment anchoring quantitative metrics. Evaluated on 207 ZIP codes affected by Hurricane Harvey, the method achieves mean absolute errors (MAE) of 10.94%–28.40% for flood extent and 16.47%–21.65% for damage severity, with prompt alignment improving accuracy by up to 4.75 percentage points.

Technology Category

Application Category

📝 Abstract

Timely and spatially resolved disaster impact assessment is essential for effective emergency response. However, automated methods typically struggle with temporal asynchrony. Real-time human reports capture peak hazard conditions while high-resolution satellite imagery is frequently acquired after peak conditions. This often reflects flood recession rather than maximum extent. Naive fusion of these misaligned streams can yield dangerous underestimates when post-event imagery overrides documented peak flooding. We present CrisiSense-RAG, which is a multimodal retrieval-augmented generation framework that reframes impact assessment as evidence synthesis over heterogeneous data sources without disaster-specific fine-tuning. The system employs hybrid dense-sparse retrieval for text sources and CLIP-based retrieval for aerial imagery. A split-pipeline architecture feeds into asynchronous fusion logic that prioritizes real-time social evidence for peak flood extent while treating imagery as persistent evidence of structural damage. Evaluated on Hurricane Harvey across 207 ZIP-code queries, the framework achieves a flood extent MAE of 10.94% to 28.40% and damage severity MAE of 16.47% to 21.65% in zero-shot settings. Prompt-level alignment proves critical for quantitative validity because metric grounding improves damage estimates by up to 4.75 percentage points. These results demonstrate a practical and deployable approach to rapid resilience intelligence under real-world data constraints.

Problem

Research questions and friction points this paper is trying to address.

disaster impact assessment

temporal asynchrony

multimodal fusion

flood extent estimation

real-time crisis sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal retrieval-augmented generation

asynchronous data fusion

zero-shot disaster assessment