🤖 AI Summary
In humanitarian assistance and disaster response (HADR), infrastructure damage assessment—particularly for buildings and roads—faces three key challenges: severe class imbalance, scarcity of moderately damaged samples, and substantial noise in pixel-level manual annotations. To address these, this work pioneers the integration of vision-language models (VLMs) into disaster damage data generation. By jointly leveraging remote sensing imagery and human semantic priors, our approach enables semantic-guided, fine-grained, and diverse synthetic damage image generation—effectively mitigating annotation noise and augmenting hard-to-classify samples. Extensive experiments demonstrate that the synthesized data significantly enhances the generalization capability of deep learning models on multi-level damage classification tasks. Notably, our method achieves state-of-the-art (SOTA) performance in fine-grained identification across multiple infrastructure types, including buildings and roads.
📝 Abstract
It is of crucial importance to assess damages promptly and accurately in humanitarian assistance and disaster response (HADR). Current deep learning approaches struggle to generalize effectively due to the imbalance of data classes, scarcity of moderate damage examples, and human inaccuracy in pixel labeling during HADR situations. To accommodate for these limitations and exploit state-of-the-art techniques in vision-language models (VLMs) to fuse imagery with human knowledge understanding, there is an opportunity to generate a diversified set of image-based damage data effectively. Our initial experimental results suggest encouraging data generation quality, which demonstrates an improvement in classifying scenes with different levels of structural damage to buildings, roads, and infrastructures.