DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the limitations of current visual question answering (VQA) models in supporting safety-critical, complex reasoning within disaster scenarios, which hinders their utility in emergency response. To bridge this gap, the authors introduce DisasterVQA, the first VQA benchmark specifically designed for disaster response, comprising 1,395 real-world social media images and 4,405 expert-annotated question-answer pairs covering events such as floods, wildfires, and earthquakes. Notably, the dataset integrates humanitarian response frameworks—including FEMA’s Emergency Support Functions (ESF) and OCHA’s MIRA methodology—into its design to support situational awareness and operational decision-making tasks. DisasterVQA encompasses binary, multiple-choice, and open-ended questions; evaluations using seven state-of-the-art vision-language models reveal significant deficiencies in fine-grained quantitative reasoning, object counting, and performance on low-frequency disaster contexts.

Technology Category

Application Category

📝 Abstract

Social media imagery provides a low-latency source of situational information during natural and human-induced disasters, enabling rapid damage assessment and response. While Visual Question Answering (VQA) has shown strong performance in general-purpose domains, its suitability for the complex and safety-critical reasoning required in disaster response remains unclear. We introduce DisasterVQA, a benchmark dataset designed for perception and reasoning in crisis contexts. DisasterVQA consists of 1,395 real-world images and 4,405 expert-curated question-answer pairs spanning diverse events such as floods, wildfires, and earthquakes. Grounded in humanitarian frameworks including FEMA ESF and OCHA MIRA, the dataset includes binary, multiple-choice, and open-ended questions covering situational awareness and operational decision-making tasks. We benchmark seven state-of-the-art vision-language models and find performance variability across question types, disaster categories, regions, and humanitarian tasks. Although models achieve high accuracy on binary questions, they struggle with fine-grained quantitative reasoning, object counting, and context-sensitive interpretation, particularly for underrepresented disaster scenarios. DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision-language models for disaster response. The dataset is publicly available at https://zenodo.org/records/18267770.

Problem

Research questions and friction points this paper is trying to address.

Visual Question Answering

Disaster Response

Situational Awareness

Vision-Language Models

Benchmark Dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

DisasterVQA

Visual Question Answering

Disaster Response