From Pixels to Semantics: A Multi-Stage AI Framework for Structural Damage Detection in Satellite Imagery

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This study addresses the challenge of unreliable building damage assessment from low-resolution remote sensing imagery following natural disasters. To this end, the authors propose a multi-stage AI framework that first employs a Video Restoration Transformer for super-resolution reconstruction, then utilizes YOLOv11 for building localization, and finally integrates multiple vision-language models (VLMs) to perform four-level semantic damage classification. The work introduces an innovative VLM-as-a-Jury mechanism to mitigate model bias and leverages CLIPScore for reference-free semantic alignment, thereby enhancing robustness in safety-critical scenarios. Evaluated on the Moore tornado and Matthew hurricane subsets of the xBD dataset, the method demonstrates superior performance in semantic interpretability and generation of actionable emergency response recommendations.

Technology Category

Application Category

📝 Abstract

Rapid and accurate structural damage assessment following natural disasters is critical for effective emergency response and recovery. However, remote sensing imagery often suffers from low spatial resolution, contextual ambiguity, and limited semantic interpretability, reducing the reliability of traditional detection pipelines. In this work, we propose a novel hybrid framework that integrates AI-based super-resolution, deep learning object detection, and Vision-Language Models (VLMs) for comprehensive post-disaster building damage assessment. First, we enhance pre- and post-disaster satellite imagery using a Video Restoration Transformer (VRT) to upscale images from 1024x1024 to 4096x4096 resolution, improving structural detail visibility. Next, a YOLOv11-based detector localizes buildings in pre-disaster imagery, and cropped building regions are analyzed using VLMs to semantically assess structural damage across four severity levels. To ensure robust evaluation in the absence of ground-truth captions, we employ CLIPScore for reference-free semantic alignment and introduce a multi-model VLM-as-a-Jury strategy to reduce individual model bias in safety-critical decision making. Experiments on subsets of the xBD dataset, including the Moore Tornado and Hurricane Matthew events, demonstrate that the proposed framework enhances the semantic interpretation of damaged buildings. In addition, our framework provides helpful recommendations to first responders for recovery based on damage analysis.

Problem

Research questions and friction points this paper is trying to address.

structural damage detection

satellite imagery

semantic interpretability

disaster response

remote sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models

Super-resolution

Structural Damage Assessment