🤖 AI Summary
To address the challenge of jointly modeling heterogeneous multimodal data—such as satellite/aerial imagery, time-series meteorological measurements, and textual incident reports—in disaster management, this paper introduces the first domain-specific multimodal large language model (MLLM) tailored for disaster classification. Methodologically, we propose a cross-modal attention mechanism and an adaptive Transformer architecture, integrated with multi-source data alignment and large-scale joint pretraining to achieve deep fusion of visual, temporal meteorological, and textual features. Our key contribution is the pioneering adaptation of the LLM paradigm to multimodal disaster scene understanding, significantly improving both generalization capability and decision interpretability. Extensive experiments demonstrate state-of-the-art performance: 89.5% accuracy, 88.0% F1-score, 0.92 AUC, and 0.88 BERTScore on multimodal disaster classification—consistently outperforming existing SOTA approaches.
📝 Abstract
Effective disaster management requires timely and accurate insights, yet traditional methods struggle to integrate multimodal data such as images, weather records, and textual reports. To address this, we propose DisasterNet-LLM, a specialized Large Language Model (LLM) designed for comprehensive disaster analysis. By leveraging advanced pretraining, cross-modal attention mechanisms, and adaptive transformers, DisasterNet-LLM excels in disaster classification. Experimental results demonstrate its superiority over state-of-the-art models, achieving higher accuracy of 89.5%, an F1 score of 88.0%, AUC of 0.92%, and BERTScore of 0.88% in multimodal disaster classification tasks.